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Optimum Linear Finite-Dimensional Estimator 
of Signal W aveforms* 


TADAO KASAMIT 


Summary—This paper deals with the linear estimation of noise 
corrupted signal waveforms, from observation over a specified 
finite time-interval. Two new features are delineated in this paper: 
1) the estimating operator is assumed to be finite-dimensional, 
because economically realizable operators are finite-dimensional 
in general, and 2) the cost of observation and estimation is taken 
into account, and assumed to be dependent upon the dimension 
of the operator only. As a result, the optimum linear finite-dimen- 
sional operator that minimizes the risk (the sum of cost and average 
loss) is obtained for the case in which a quadratic loss function 
due to error is adopted. 


1. INTRODUCTION 


HIS PAPER is concerned with the following prob- 
Alpes A signal process, X(t), is corrupted by noise. 
Y(t), the signal corrupted by noise, is observed 
continuously over a_ specified finite time-interval Js. 
Signal and noise are nonstationary in general. Then the 
true signal waveform on a time-interval J, is to be esti- 
mated on the basis of the observed values of Y(t) over /:. 
Many authors have developed the nonstationary 
smoothing and prediction theory [2]-{10]. The present 
approach is related to that of Middleton [9], David [4], 
Bendat [8], and Kramer-Mathews [7], among others. 
Where finite data are used, the operator selected should 
(based upon an appropriate criterion) be the one providing 
the best estimated signal waveform corresponding to the 
observed waveform. Middleton considered this problem 
from the viewpoint of decision theory and assumed that 
infinite numbers of linear time-invariant filters can be 
employed [9]. In most practical situations, however, the 
operations that can be realized physically and economic- 
ally are finite-dimensional. By a finite-dimensional oper- 
ator,» we mean an operator whose range is a finite-di- 
mensional space. (See Section III.) Also, the cost of 
observation and estimation must be taken into account. 
In Section III a simple cost function will be introduced. 
In most situations there is little reliable information 
about the statistics of the waveforms of signal and noise 
except, perhaps, for the second-order properties, so that 
elaborate nonlinear estimates, requiring additional sta- 
tistical description of the signal and noise, cannot be 
applied. For this reason, and for the sake of simplicity | 


* Received by the PGIT, May 6, 1960; revised manuscript 
received, October 14, 1960. The present paper is based on an earlier 
work of the author [1]. 

7 Faculty of Engrg., Osaka University, Osaka, Japan. 

1 A finite-dimensional operator is also called a degenerate operator 
or a dyad. 


a quadratic loss function due to error is utilized, and the 
discussion will be restricted to linear estimates. In Sections 
IV and V, the optimum linear finite-dimensional estimate 
that minimizes the risk (the sum of cost and average loss) 
will be derived. 


Il. REPRESENTATION OF THE ORIGINAL SIGNAL 
PROCESS AND THE OBSERVED SIGNAL PROCESS 


Let X(t) denote the original signal process on an interval 
I, = (0, T]0 < T < @~), and let Y(t) denote the observed 
signal process (received signal corrupted by noise’) on an 
interval J, ="[Ty; Po\(— 2 <a ee 

Without loss of generality, we may assume that 

EX) = 0; E Y(i)°=.0; (1) 


where / denotes the mathematical mean. Also, it is 
assumed that: 


Oni) (2) 


X(t) is continuous in quadratic mean, 


aX peerco. 


Y(t) can be represented as follows, 


(3) | 


Y(t) = Yi) + Y,(d), 
BY.) = 0,50 (Eee (4) 


where Y,(¢) is continuous in quadratic mean, and Y,(é)! 
is either white Gaussian noise statistically independent : 
of X(t) and Y,(t), or vanishes identically; and K,(t, t2),' 
K; (ty, t2), K;,(t, te), and Ky(t, t2) are known and defined | 
by 


K,(t,, te) = EX(4)X(b), 
Kh, &) = HYG)Y(), (,, bb & Ts), 
Kh, te) = HY (4) Vile), (tie e I,), 
Kuh, tb) = BX(4) VY) = EXGA)YUA) Gach, tae I»). 


Since K, and K,, are continuous non-negative-definite | 
functions, we shall use Mercer’s theorem [12]. Then, 


Kee ty) = DE Mmm ti)bm( to) 3 Km BS Mm+1 > 0, (6) 


Ce Eo be). 


(5) 


K.(4, 6) = Loon Um (vn Ch); om = Omi, > 0, Gam 
where the series converge absolutely and uniformly on! 
I, X I, and I, X J;, respectively, and the continuous 


| 
i} 


* Y(t) may or may not be the signal plus noise. | 
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unctions ¢,[y,,] are eigenfunctions of K,(K;,) corre- 


ponding to eigenvalues u,,[o{)’]: 


[ Kilts, bulls) dla = padall), (8) 


ap 
I Kh, to) Wm (to) dt, an Ue Cae (9) 
ig 
igenfunctions which correspond to (necessarily finitely) 
nultiple eigenvalues are written with distinct indexes, 
ind all eigenfunctions are orthonormalized: 
T T2 
[ dm(borlt) dt = Sinn, i! Ve? (QW? (t) dt = Sim. (10)° 
T1 
Then X(¢) and Y,(¢) have the following orthogonal de- 
“ompositions (see [12]): 


X(t) = 2s Ln, (L), (11) 
YO = Vay we (12) 

vith 
= ie Onn LY Vy = On Ome (13) 


Let us define H, as the linear subspace of L.(J,) spanned 
vy all ¢,’s in (6). Note that 


ee 
i ia i=? Yoder ek 
0 n n 


Chen we see that a sample of X = (a, 22, ---) belongs to 
71, with probability 1. For the case in which Y, = 0 
n (3), let Y, = WS and y, = y<”, and let H; denote the 
inear subspace of L.(/,) spanned by all y,’s. Then a 
ample of Y = (y, y2, --:) belongs to H; with probability 
|. Also in case Y, ¥ O, let {y,(t)} denote a complete 
rthonormal set of functions of L.([.) which involves all 
y (t)’s. We now put 


(1) (2) 


Pe Yk aie Us 


Te 


Yn = Y,(d)¥,(2) dt, 


: 


(14) 


| 


T2 
i So ¥n(t) dat, a),  (« > 0). 


vt 


Jere x(t, w) is Wiener’s Brownian Motion function [13], 
nd 


Hy Un =o Ome (15) 
[Then Y(é) can be represented as a random _ vector 
Y = (1, Yo, -*+). Let H; denote the space consisting of 
ll sample vectors of the random vector Y. (If Y.(t) 4 0, 
ZT, is not a Hilbert space.) Now we shall define matrices 


J, V, and W as 


3 Sn 18 the Kronecker 6. 
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U =e (Um * Cr) 
1 < m,n < dim H,(= the dimension of H,), 


Ve Sla enol 
1 < m,n < dim H,(= the dimension of H,), 


W =ipea)s 1a ies Cited 1 <7, Sin 


where 
om = Bym = Ely}? + Elum }? 
= Ely \" + 0 2 o, (16) 


Din Ee pee 


Then U and V are positive-definite type (in the strict 
sense) diagonal matrices. 


IIL. Linear Finire-DIMENSIONAL ESTIMATOR 


In Section II it was shown that X(¢)[Y(4)] can be repre- 
sented as a random vector 


xX = (fa, Ue, °° LY aoe (Hn, Yay 5 *) 
and that samples of X(Y) belong to H,(H;) with prob- 
ability 1. Therefore, a waveform estimator can be regarded 
as a mapping from H; into H,. In general, a linear M- 
dimensional estimate of X(t), X(é), can be expressed in 
the following form: 

M 

X= VTA); eH, (17) 

k=1 
where I',(Y) is a linear functional defined in the entire 
space of H;. It may be justified to assume that: 


Hie o (bes ID en (18) 
and T',(Y) can be represented as 
WOO) = D Viale (19)" 
From (15), (18), and (19), it follows that 
Le Yintn < @- (20) 


As stated in Section V, IT, satisfying conditions in (18) 
and (19) can be realized within any given quadratic mean 
error by the output at 7, of a linear time-invariant filter. 
Thus, the practical waveform estimator defined by (17) 
consists of a parallel bank of M linear realizable filters, 
the outputs I,’s of which are multiplied by ©®,(¢,) and 
added to give the estimate of the wave at t,. By allowing 
t, to vary in [0, 7], the entire waveform may be reproduced 
(in estimate). 

Now let H, denote the real vector space which is of 
the same dimensions as H, is, and which consists of all 


4 >, converges in quadratic mean. 
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real sequences 2 = (2, Z2, ++) such that 
eae, <nres (21) 
Further, let us define the inner product of 2 with 
ale = (21,22, <*>) aby 
Cae (22) 


Then H, can be regarded as a separable Hilbert space. 
If we put 


(n) 


ee = (030,72 is, 0% =) (23) 
then we get 
i my Znln« (24) 
Let 7, be defined by 
Y% = of Yin€n (25) 
then (20) implies 
v1. © H,, (= 1, 2,-> 592). (26) 


To simplify the expression, we shall use the notation 


[Yx, Y] = Sy. VinYn« (27) 
By using these notations, (17) can be written as 
s M 
xX = »s [Ye Y]&,, 28)" 
k=1 
®,, & Ips. Yr € Al,. 


By a linear finite-dimensional estimate, we shall mean an 
estimate that can be expressed as (28). 

It is clear that the cost of observation and estimation 
depends principally on /, the dimension of the estimator. 
We may reasonably assume that the cost function c 
depends only on M and satisfies the following relations: 


cM) < cM + 1), 


c0) = 0; clw) = o, (29) 


and that the loss function due to error, L, is evaluated by 


nT 


ihe if BG Guise wees | Gi, 


where / is a positive constant and || V || denotes the norm 
of a vector V e H,. Let us define the risk r by 


r = ¢(M) + EL. (31) 


In Section IV, the optimum estimate that minimizes the 
risk is derived. (In what follows, it is assumed that if 
X, and X, give the identical risk and X, gives less average 
loss than X,, then X, is preferable.) 


> X, dx, ye and ®; denote, respectively, the vectors corresponding 
to X(t), dx(t), w(t), and &;(t). 
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Note | 


1) “Finite-dimensional operator’ in this paper corre- 
sponds to what is called dyad or degenerate operator in 
linear space theory [18]. In (28), it is essential that the: 
finite-dimensional operator can be represented only by a 
finite number of vectors ®,’s and y,’s. Generally speaking, 
®,(y,) is represented as an infinite linear sum of basis 
{b,'({p,}). However, this is not essential. The bases |¢,} 
and {p,} have been adopted simply for facilitating mathe- 
matical handling. In the optimum operator, as will be, 
stated in Section V, ®,(t)’s are identical with eigenfunctions 
é,(t)’s as defined by (54); w,(é)’s which are y,’s expressed 
in terms of time functions are equal to eigenfunctions 
W,o(t)’s as formally defined by (61). 

2) In this paper, the ‘“‘cost’’ does not include the ex-| 
penses for calculation connected with system design. The: 
“‘cost’’ comprises the expenses of real-time calculation,’ 
or filters, adders, function-generators, etc., necessary for 
realizing the operator obtained upon calculation. The 
problem of the expenses for calculation is given no further, 
treatment here. 

3) The assumption in (29) is an approximation intro- 
duced to simplify the theoretical approach, but it is 
probably an appropriate approximation. (Refer to Sec-' 
tion VI.) 

4) Where ™ is restricted within 1 < M < Mo, we may 
put.c = oi ai, | 

5) When c(M) is not given explicitly, we may regard 
M as a parameter and use the results stated in the proof 
of the theorem (see Appendix) to know how the loss due 
to error decreases with increasing M by AM. 

6) The present problem stated in terms of vector spaces 
may have a wider meaning. As an example, let us consider 
the linear estimation of a vector X = (a,, --- , vy) based 
on an observed vector Y = (y,, --: , yw’). Suppose that 
N and N’ are large and the components of Y(X) are 
highly correlated. Then, if the estimate is to be obtained 
within a given time, it is more reasonable to arrange the 
data Y in appropriate forms IT, = > Vn (k = & 
2,°::,M;M < N) and to estimate X from T,’s than to 
estimate X directly from Y. 


IV. Optimum Linear Fintre-DIMENSIONAL ; 
Estimator I 


Let us define an orthonormal basis of H,, {e/}, as 
follows: 


aca Op Gy Lee (32)°: 

If we put ; 
Yin = On Vins (33). 

then 
Ye = Du Yun*€n = ee Vin els (34) 


6 Note that o, > 0. 
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‘here is no loss of generality in making the assumption: 


(vis Ye) = Sve, CP OE EOE) (35) 


ince a linear transformation of y,’s in (28) induces only 
near transformation of ,’s. From (19), (22) and (35), 
» follows that 


EY, = Dos (@ k= ile 2, UE (36) 


We shall use the basis {e,} for H, and the basis {¢,} 
or H, throughout this section. Then [y,, Y] becomes 
SS On Vn: 

Lemma 1:’ For a given positive integer M(< dim H;) 
nd given y,(k = 1, 2, --- , M) satisfying (35), ®,’s which 
ainimize HL of the linear finite-dimensional estimate X 

re given by 


= WV’, ee a I) (37) 
vhere V-”* is the diagonal matrix; 
Va = diag cies : os 1) =), 
lere, 
M » 
PM tien at || Pe || 
nm k=1 
M a 
Seu ey WW Vy, 88) 
n k=1 


y’, and W” denote, respectively, the transposed vector of 
; and the transposed matrix of W.) 
Let y, = e/ in (87); then there results 


Nera iV epee 


=i 
aa PnkOK 
n 


ince L > 0, it follows from (38) that 
At 
So a ea 
n k=1 1 


Yote that MW in the above expression is arbitrary. Then 
re find 


dim#Hs dimHs 


(39) 


in SS Sap ea 

Now, let L, denote the linear operator from H, into 
self which is represented by the matrix V-’?W* WV”, 
nd let L, denote the linear operator from H, into itself 
hich is represented by the matrix WV 'W”. Clearly, 
, and L, are symmetric and non-negative-definite types. 
Iso let us define the norm || A || of a matrix A = (a,,) by 


= S | Qik ie 
i,k 


n=1 k=1 


7 For the proofs of lemmas and theorem, see the Appendix. 
8 In Sections IV—-VI, yx, ex, @x’, Bx, etc., denote column vectors. 
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By using the Cauchy inequality and inequality (89), 
we obtain: 


|| Vo We wy 2 |" = 


a xs, (Os oe 1 Dyn Ores Ne 
D q m 


z x DQ Pmt (DE Pata 
= (2D pmee' <<, (40) 
| WV WP = DDL Pomorn' Pam)” 
= » me, (do Oonee NOS a) 
= (2D Pinon)” < @- (41) 


Hence L; and L, are symmetric non-negative-definite-type 
linear operators of the Hilbert-Schmidt type [16]. Con- 
sequently, ZL, and DL, have pure discrete non-negative 
eigenvalues [14], [16] and the nonzero eigenvalues can be 
arranged in order of decreasing magnitude: 


Nie es ce eel Ws 


Ne ee 


ae Oe 


Let ¢,(&) denote the orthonormal eigenvector of L,(L,) 
corresponding to \; (AZ). That is, 


Vase We WG, = Nebr, (42) 
(Ep io) =O, (43) 
WV Ws, = ME, (44) 
(Para) =D (45) 
Then the following lemma is verified. 
Lemma 2: For any index k, 


And further, for nonzero eigenvalues },’s, let £,(¢,) be 
defined by 


&, = Vea, anes 
(Crea Vee Vee (48) 


then &,’s (¢,’8) are orthonormal eigenvectors of L,(L,) 
corresponding to X,’s. The following theorem can be 
derived from Lemmas | and 2. 

Theorem: The optimum linear finite-dimensional esti- 
mate X,,, that minimizes the risk is given by 


(47) 


Mopt Mopt 
Xe = 2D alee Wier Ss Nal Nee We i) ) (49) 
k= 
and the minimum risk 7,,;, 18 given by 
Mopt 
fa CMe) eX Mian a ae (50) 
k=1 


9 Throughout this paper multiple eigenvalues are written with 
distinct indexes. 
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Here, &(¢,) is defined by (44) and (45) [(42) and (483)], 
.(&) is defined by (48) [(47)], and M,,, is the largest 
of integers that minimize 


M 
c(M)-1>>u, ©< M <dim#Z,). 
k=1 


This theorem shows that our present problem to find the 
optimum linear finite-dimensional estimate is reduced to 
an eigenvalue problem. 

It is interesting to note that if H, is of finite dimensions, 
say N, then 


Moor < My, < N(M, being the rank of WV"'W’), 
and if 


c(M + 1) — c(M) > eM) — c(M — 1\(M =1,2---), 


M,,, is the largest of integers satisfying 


ye 

Maes c(M) aM De (51) 

V. Optimum LINEAR [tnr1TE-DIMENSIONAL 
Estimator II 


Here, the results obtained in Section IV will be written 
in terms of time functions. K,(t, ta), A(t, t2) and K,,(t,, tz) 
correspond to U, V, and W, respectively. Let F(t,, t2), 
D(t,, t:) and &,(t) denote the functions corresponding 
to WV", WV 'W* and &,, respectively. Then the follow- 
ing expressions are easily obtained: 


[Rt OK, 8) at = Kult, ty. (52) 
D(t,, t) = ie Piet Kort nats (53) 
it | D(E, r)&(7) dr = Mée(Z). (54) 
[son at = b. (55) 


Eqs. (52) and (53) are formal expressions. Where Y,(t) = 0 
in (3), F(4, t:) may not be a function in the ordinary 
meaning. /(t,, 2) has been shown to be the kernel function 
of the linear least-squares estimator without consideration 
of cost [3], [8]. On the other hand, D(t,, t.) belongs to 
In, X J). 

Corresponding to y,, let us define w,(t) formally by 


N 
wits — 1) = lima,’ (0; — i) = lim Do ya) 2. 6) 
N 


N- © N-o n=1 


While w,”’ is a continuous function, w,(f) may not be 
an ordinary function where Y, = 0. Consider Y, # 0; 


t.e., ¢ # 0. Then (15) shows that 


a, 2 @ > O- 
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From (22), it follows that y, « H, implies w, e L2(Z2). 
According to (19), the expression below holds. 


Ts 


T, = lim | wR — HYOide 


N-o Ti 


57)” 


Now, (56) implies that for H, the basis {e,} is used 
in place of {ef}. When the basis {¢,} is used, ¢, is expressed - 
as | 


Cee Ve Wace (58) ) 


Thus, wio(T2, — t) corresponding to \,/7-¢, is given by; 


r 
wo, —) = | FU nel) dr (59) 
0 
Consequently, the following corollary is easily obtained 
from the theorem in Section IV. 
Corollary: The optimum linear finite-dimensional esti- 
mate X,,,(£) is given by 


Mopt Ts 
Rul) = VEO-f wolfe - 9YOat, (60 
k=1 Py 
where V/,,, is defined in the theorem in Section IV. The: 
minimum risk 7,,;, 18 given by 
Mopt 


DB ars | 
k=1 


4g 
rnin = Moy) + [ KA, Oat 1 
0 


Remark: Irom (44) and (58), we have 


W* Writ, = W'W-V'W’é, } 

; 

=)\,VVOW- & 

SAR VON Gry | 

Tes 


aT pT2 
/ | Kuh, HK u(t, to) Wro(T"s Ex ty) dt, dts 
0 T, 


Ti t 
a. ‘| K.lé, till: — i) df. Gi 
Ts. 


VI. Discussion AND EXAMPLES 


As mentioned before, if Y2 = 0, wy0(t) does not always , 
belong to L,, and w,o(t) may involve 6(a — x), 6’(x — 2p), . 
etc. Sometimes &;(¢) and w,o(t) can be obtained explicitly; | 
also, in some cases, numerical analysis is the only way of 
finding &(¢) and w,0o(t). The procedure of employing the 
series wh” = oY, YinWn(t) shown in (56) is one of those 
applicable in the present case. As shown in (57), with an | 
appropriate N, | 
T 2 

wi (T. — & Y(t) dt 


Ths 


approximates to T, within any given quadratic mean error. 


10 Instead of the right-hand side of (57), let us write simply 
St: wT, — t)¥(t) dt. 
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When calculating for system design, N is preferably 
arge enough so that the quadratic mean error due to 
runcation will be covered by round-off error. In the 
next stage, 7.e., circuit design, approximation should be 
sarried out in as efficient a manner as possible. 

Since 7, and T, are assumed to be specified, T,, can be 
realized by a set of function-generator, multiplier, and 
ntegrator. It is necessary here to consider the cost in 
association with accuracy. I, can also be approximately 
obtained by the output at 7, of a linear filter composed 
of a finite number of lumped constant elements. In this 
ease, we need to find such an appropriate real function 
w(t) as [19]: 

Wilt) = 


Eese 10; (62) 


and at the same time 


T2 
i wi(T, — Y(t) dt 
0 


approximates to 


T2 
i well, = HY) di 
on the average. 

Here, the choice of a suitable set of {s;} is essential for 
vetting good approximation with relatively few terms. 
The cost of the filter depends upon p, R,’s, s;'s and the 
required accuracy of elements. Now, let r‘’’ denote the 
sum of the cost of this filter and the mean loss due to 
error derived from approximation. Let us consider the 
case where w,0(t) is obtained in the form of D> Yuwn. 
Even if N > N’, the realization cost of 


T2 
| wT, — Y(t) dt 
Tan 
is not always higher than that of 

T 2 
[ Wine (sub y(t) ae 
oT, 


with the same accuracy. In general, there is no relation 
between the basis {y,,} and the cost of filter. Even though 
), is in the form of 


Se Res) > 0, (63) 
i=1 


here is not always an explicit relationship. For an illus- 


ration, consider 
Wyo(t) a ei, (s at Sni) 


= lim wyo’(2). 


N-© 


f s is a moderate value, the realization cost of 


To 
ii Ce Vb) OL 
T1 
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is lower than that of 


ie 
| wT, — HY (8) dt. 
T1 
Thus, it is difficult to formulate the relation between r‘” 
and w,9 to make it applicable to general instances. 
For a fixed M, let us define X x, as 


Ry = ald ik Pup Ted 


where &(¢) and w,o(t) are defined by (54) and (59), re- 
spectively. Then, X,, yields the minimum value of 
BE || X — X ||?. (Refer to proof of theorem in Appendix.) 
Actually, it will be impossible to realize precisely X yy. 
Let X,.,, denote the operator which indicates the function- 


ing of real equipment, and let us assume that X,,y is 
expressed as 


Kw = DH@ [ wh, — )YC dr, (65) 


while denoting the cost of this equipment by cy. Also, 
let XY; and X,, be defined as 


Ree ee) i “ORT ee ee 
et st (66) 
X20 | espe vias 


If £*(t) and w(t) approximate to &,(t) and w,o(t), re- 
spectively, by neglecting infinitesimal terms of higher 
order we can obtain: 


FO Gre? eran eee BEG Ge S| 


=H |X ee | ar Xe rae 
he OX xs, NS Xe Xara ie (67) 
Since £, and wy,» minimize E || X — X |), 
VCD. G 8a Ne SG || 
=E\|X,-Xy |, 68) 
Bal X 9X a) a eee ae era 
=H || Ke— Xue lls 69) 


[For validity of (68), refer to the proof of Lemma 1. 
Notice that the sum of the first and second terms is equal 
to E || X — Xx ||’, and the third one is equal to 
E || X; — X,, ||? in the right-hand side of (85). Also, the 
validity of (69) is assured in a similar way.] 

The objective of circuit design is to minimize 


Pog = Opp AU WX, — Xp | Xe I 0) 


Strictly speaking, the minimum value of r,, depends not 
only on M, but also on &,(t)’s and wyo(t)’s; still, it is 
almost impossible, as seen in the foregoing, to formulate 
this dependence. On the other hand, it is clear that min 
rjc usually depends mostly upon M. Hence, as a first- 


PAD. 


order approximation, it is logical to assume that min ry 
depends only on MW. Then, we put 


min ry = c(M). 


Thus we can define c(J/) more precisely. 

To summarize the foregoing, the author takes it as 
appropriate to consider the problem at issue in two differ- 
ent steps. 1) System design: find M,,., Xopt, &(é)’s and 
W,o(t)'s; 2) circuit design: choose the set of &*(t)’s and 
w*(t)’s which approximate to &,(¢)’s and wy,9(t)’s, re- 
spectively, and minimize ry,,.,,. 

In most cases, it is almost impossible to solve both steps 
on the same level. In step 1), consequently, step 2) is 
considered as represented simply by c(/). If c(/) cannot 
be defined explicitly, M/.,, is to be decided by regarding 
c(M) as an unknown parameter and actually estimating 
min 7y,, MW being a number of figures within some suitable 
range. 

The object of this paper is to present a procedure for 
finding a solution on the level of step 1). 

As an example, we shall compare the optimum estimate 
in Sections IV and V with the estimate of the signal 
waveform such as is obtained by an appropriate linear 
interpolation based on X.(t:,), the linear least-square 
estimates of X(t,,) at M representative points t,, of J,. 
Here, X..(t,,) is given by 


REA Pee r Pia) YO.at 


A 1 


(71) 


We may assume that 
E || Xa(ty) ||? < @. 


Let us denote the above estimate by X,(t). Then X,(t) 
is clearly a linear finite-dimensional estimate in the 
meaning of Section III. We see that X, cannot yield less 
mean-squared error than X ,, defined by (64). 

Next, let us consider a simple case in which the signal 
is contaminated by the addition of white Gaussian noise 
statistically independent of the signal, 7.e., in (3) 


Y@) =X) + Y.(, 


and where the interval J, is identical with the interval 
T,. In this case, the following equations are readily derived: 


(72) 


V = U-+ oF (E: unit matrix), 


W =U, 
WV'W’ = UU + o)"U. (73) 
Hence, we have 
b(t) = ¥(t) = E(t), (74) 
te Be 
; pee a 
Also, 
Mi Gee ; Eh gy, (76) 
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Consequently, it follows from the corollary in section V 
that X,,, and rain are given by 


Mopt el! 
ta Ve Kk 7 
Ku) = Dl) [il VG) dr, 77) 
Te Mopt 2 
= MV € ae Hi ae wes 
pear een mt KG") a= 1 
where 
/ K,(¢, r)¢.(7) dt = Lidy (2). (79) 


Particularly, if 


N 
Kit) =) Boe ae (6.20). 
k=1 


the solution of eigenvalue problem (79) has been found 
[15], and the realization of filters is rather simple. For 
example, the covariance function K,(t, 7) is given by 


Ke 1S OR OY Ss O). 
Middleton [17] shows that 


2B 
.= oe, 80 
CP ® BC ewan) 80) 


ie , fi 
,(t) = Ch COS sa.(t a q) + ID, sim sa(t — ) ; 
where g;, is a positive root of: 


tan 6T'q, = "5 , (81) 


di, G2 *** are arranged in order of increasing magnitude, 
and C',, D, are constants. 

Suppose that 67 ~ 1 or BT > 1 and that BT ~o 
or BT < oc. Then from (80) and (81), it follows that 


= bE 
Or: a eT ) 
Kx < 0, 


except for the first few k’s. Therefore except for these 
first k’s, 
i 4B*p°T" 
xX = a ae 
be o o(kr) 


Then a small number of filters is probably sufficient. 
It will be noted that (even if there is not noise) finite- 
dimensional estimators cannot produce accurate signal 
waveforms. If FE | X(t) — Y(t) |?) < ©, ie, Y, = 0, 
to circumvent this difficulty we may apply the results in 
Sections IV and V to the estimation of X(t) — Y(t) 
instead of X(t) and adopt an identity operator plus a 
finite-dimensional operator for an estimator. 
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VII. Conciusion 


This paper has described a new approach to the esti- 
mation of signal waveforms based on received data in a 
specified finite interval. The extensions to more compli- 
cated cost functions or to nonlinear finite-dimensional 
operators remain to be done.” 


APPENDIX 


Proof of Lemma 1 


Let us put 
P, = yE inns (82) 
SIDS, is oP (83) 
then from (17) we have 
M 
Ln = sy iy oa ene (84) 
k=1 


From (80) 
AG = db xX = XP = 1 eG, = 4)’. 


By substituting (84) into the above equation and using 
EY ,T, = 6;,, we obtain 


M 2 
EL=1>> 1(e, —-> Tidus) 
n k=1 
M ; 
= D> Bn l De Dae (le) 
n k=1 n 
(85) 


JB) Se SA oem ile 


The above expression indicates that a,, minimizing LL 
is given by 


Akn = E(Ty2n) (86) 
Substitute T, = ye Co Ve prone Hy, Cx = a paentO 
(86); then we have 
Qkn = Ef SS VimFm Yun} 
= SE ee ox (87) 
This expression implies that 
&, = WV ,. (88) 
In this case, it follows from (85) and (86) that 
M 
EL = l Dowun ae l SS Hai, 
n k=1 n 
M 
= OTR = OE ie VEY) 
n k=1 


11 Reference [1] also discusses minimax solutions. 
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Since EL > O and D>, un < ©, (89) yields 


M 
ee) Mrs 2» |l &, ||”. 
This implies that 


Sco 2 A= 12 oe 


Proof of Lemma 2 
Let £, be defined by 
Eg = ae OW Ve. (90) 
Using the Cauchy inequality and (39), there results 
WS ss NAA I egal 
prey Dy Dem < - 
Hence “_ 
fel .- 
From (42), (43), and (9), it follows that 
WV We = 7 OW VV ee ee 
=e W Vow Nees 


= Nuke; (91) 
(E,, E) = A Oe ea eae 

= NENG ONG e ¢,) 

= Oi (92) 


Now let ¢, be defined by 
C = NYE Vine’ 2 Wace 


By following the same procedure as above, we obtain 


tie ,, 
Vi We WAV ee Ne (93) 
CG; &) = Onn (94) 


From (42), (43), (93), and (94), it follows that for any 
r,; and A, (2 # k) there exists \/., and Xf, such that 


N, = eS ASME SG eee 


Similarly, the converse relation holds. Hence 
NE — Nz, 

because {\,} and {A/} are arranged in order of decreasing 
magnitude. 
Proof of Theorem 

Suppose that M is fixed. Lemma 1 indicates that , 
must be given by 

Ce 
in order to minimize the risk. From (31) and (88) it 
follows that y,’s maximizing 
M 


Dayan ae Wave Aa, 


k=1 
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yield the minimum risk for fixed WW. Thus, the problem 
is reduced to finding y,’s that maximize }>™, (7, Lv.) 
under the restrictions (y;, y,) = 6:.. The solution of this 
problem is well known in the theory of the eigenvalue 
problem [14].'* Let H,) denote the eigenspace correspond- 
ing to the zero eigenvalue of L,. (If L; has no zero eigen- 
value, H,o is a null space.) y,’s can be expressed in the 
following form: 


Vk aa se (Ye eae ae Vin0%5 Yxro £ 5. (95) 
Then we obtain 
M M 
23 (Ye; Lyx) = 53 rn Ds (Ye; fears 
k=1 n k=] 
Here‘® 
M 
» (Ye, Go ot || Ce ) 
k=1 
M 
Yo, 6) = Ella il = 
n c=] 
Now let us consider the problem of minimizing 
ys, Nn An O, Pe ean >= 0) 
under the restrictions: 
Oss eA al 
» A, = MM. 
We easily see that by letting A, = 1l(nm = 1, 2, --- , M) 


and A, = O(n > M), we obtain the maximum value 


From this, it follows that 


M 


E (ve, Live) S > Xx (96) 
k= 
where the equality holds if 
M 
Ds (Ye; eile =F Le (n 3 ie, 2 EL) 
k=1 
(Ye; Gs) = 0, (k = I, 2, Be Ba = M). 
That is, 02, (vz, Lyyz) is maximized by letting 
M 
Se a A (97) 


where (¢,;) is an arbitrary orthogonal matrix. Here the 
maximum value of 


M M 
ys (Ye, Lyyi) is >> Nee 
k=1 k=1 


12 Reference [7] treats a similar case of finite dimensions. 
13 Notice that y;’s are orthonormal. 
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In case Aw = Aares, &(K = 1, 2,--+ , M) are not uniquely 


determined, but any choice of ¢, achieves the desired | 


maximum value. . 
We consequently conclude that X minimizing the risk 
for fixed M is given by 


M 


X= DIG We 


Since (t;,) is an orthogonal matrix, by using (47) X can 
be written as 
M M 
2 PaO S 
k=1 v= 
M 


= DS bales Vee 


7,7=1 


Pt 


ieseya = teW Vite 


M 


= Doli VI a -> ney 
k=1 
By letting ¢, = ¢, in (98), we obtain 
X = » Neer es. 


Now the risk is given by 


r(M) =cM)+1> un, —-1 yes Xe 


If r(M,) = r(M,) and M, < M,, M, is preferable, be- 


cause M,. gives less average loss than M,. Therefore the 
optimum 
mize c(M) — 1 >, . Since c(o) = © and A, < 
there exists finite M,,.. M.,, satisfies the following re- 
lations: 


INuesvas Ss CVs: aS 1) -_ CUES) 
Naess = CUM oot) ad CCM oot Pe, 1). 


Suppose that 
ce(M +1) -—c(M) >c(M) —c(M—-1); (WM =1,2, ---). 


Then, since Aw > Am+i > O, we easily see that M,,, is the 
largest of integers satisfying (51). 
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A Method of Digital Signalling in the Presence of 
Additive Gaussian Noise* 


LUDWIK KURZ?T, MEMBER, IRE 


Summary—This paper considers the basic problem of transmit- 
ting digital information through a noisy channel with minimum 
probability of error in finite time. The transmitted signals are 
average-power limited, and the noise is assumed to be additive 
Gaussian with a power spectrum which may be nonwhite. A theory 
of so-called efficient codes (minimax, equal separation, and nearly 
equal separation) is developed. Efficient codes are formed from 
weighted sums of eigenfunctions generated by an integral equation 
with its kernel corresponding to the inverse Fourier transform 
of the Gaussian noise power spectrum. In addition, the theory 
of equidistant and nearly equidistant codes [1] is extended to the 
case of nonwhite Gaussian noise. It is shown that efficient codes 
perform better than equidistant codes if the noise is nonwhite; i.e., 
properly chosen waveforms are more efficient than binary coding. 
Performance results are given for several different codes when 
the interference is white Gaussian noise and when the noise power 
density increases with increasing frequency. The detection scheme 
used does not require estimation of the signal or the noise levels 
at the receiver and is thus independent of fading. 
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of the College of Engrg. of New York University in partial fulfillment 
of the requirements for the degree of Doctor of Engineering Science. 

+ Research Division, College of Engrg., New York University, 
New York, N. Y. 


INTRODUCTION 


HE problem of detecting the presence or absence 

of fixed signals transmitted through a channel 

disturbed by additive Gaussian noise has received 
wide attention [2], [8], [9], [15]. So far, few explicit results 
are available concerning the proper choice of signal forms 
to be used in minimizing error probability if digital, but 
not necessarily binary, communication is to be effected in 
finite time through a channel disturbed by additive non- 
white Gaussian noise so as to minimize the error prob- 
ability subject to the average-power limitation of the 
transmitter. 

The demodulated noise power spectrum is assumed to 
increase with increasing frequency. The latter assumption 
is justified by the following considerations. From one 
point of view, any practical transmitting station transmits 
in an allocated band of frequencies (channel). In the 
middle of this band the Gaussian noise has a flat power 
spectrum, while near the edges of the band the noise 
power spectrum rises due to adjacent channel interference. 
Irom another point of view, the assumed form of the 
Gaussian noise power spectrum acts as a weighting func- 
tion which confines the signal to an available band of 
frequencies. 
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The theory presented in this paper is an extension of the 
theory developed by Chang, et al. [1], to combat additive 
white Gaussian noise in the channel. Several coding pro- 
cedures are discussed and the associated optimum de- 
tectors necessary to minimize the error probability of a 
unidirectional single-link communication system disturbed 
by additive nonwhite Gaussian noise, subject to average 
power limitation of the transmitter, are found. The main 
purpose of this paper, however, is to prepare background 
material for solution of a more important problem in 
digital communication—the optimization of a single-link 
unidirectional communication system in the presence of 
additive Gaussian and impulsive noise [7]. The latter 
theory will be presented in a later paper.’ 

For the purpose of analysis, synchronous detection of 
the REF signals will be presumed; the advantage of this 
approach is that the problem can be analyzed at video 
instead of RE. It will be assumed throughout the paper 
that the noise bandwidth is infinite, since any necessary 
filtering will be automatically included in the signal 
handling at the receiver. As a matter of fact, if the noise 
bandwidth were finite, one could always find a set of 
transmittible signals for theoretical zero probability of 
error.” No restriction is placed on the bandwidth of the 
signals; however, with the increase in rate of transmitted 
information (smaller transmission time or more compli- 
cated waveshapes), larger signal bandwidths are required. 

The basic assumption in this paper of detection inde- 
pendent of fading is justified by the simplicity and 
practicability of the associated optimum receiver, and 
also because the resulting symmetric channel has a 
smaller error probability than the asymmetric channel 
for a fixed channel capacity [11]. 


THe Optimum DETECTOR 


Consider a unidirectional single-link communication 
system. One can send through it a message of m 
bits by transmitting with equal a priori probability 
one of a set of time-limited signals {s,(f)}, 0 <t¢ < T, 
4 = I, 2, 3, , 2”. Corresponding to the transmission of 
some particular s,(¢), a signal y(¢) is received. In general, 
the receiver cannot, in finite time 7’, determine with zero 
error probability to which particular transmitted signal the 
received signal y(t) corresponds, because of the additive 
Gaussian noise in the channel [3], [4]. However, any 
receiver which retains the conditional probabilities p(s;/y) 
that s;(t) was transmitted when y(t) is received, will 
preserve all the information of the received signal [14]. 
When a decision must be made, and only one signal can 
be selected out of 2” possible signals, the best a receiver 
can do is select the signal with the greatest a posteriori 
probability p(s;/y) [14]. This receiver will be considered 
optimum. 


1 See also [18] and [19]. 
2 Transmit the signals in the noise-free band of frequencies. 
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To determine the conditional probability p(s,/y), 
expand both y(t) and s,(¢) into a generalized Fourier 
series [2] using the orthonormal set {g,(t)}, generated by 
the integral equation 

» T 
ciel) = | K@ —Dela)dz, OSe<T (1) 
0 
where K(x) is the inverse Fourier transform of the noise 
power spectrum. Then 


a) = Lane), (2) 
where 
tin = iE Fine: 
and 
y(t) = nt) + s,(t) 
Se x Yer?) , (3) 
where 
We = etiGre (4) 
and 
hee iL neoude (5) 


By the Karhunen-Loéve theorem [6], all , are Gaussian 
and statistically independent with variances o; and means 
zero; therefore, all y, are Gaussian and statistically inde- 
pendent with variances o; and means @;;. 

Since the decision at the receiver is made so that the 
signal with the largest a posteriori probability is accepted, 
the test reduces to choosing s;(¢) among a pair of signals 
s,(t) and s;(6), if 
Ojx) > 0 


ye Yu(Qin ls = At) 


Ox 


(6) 


and s,(t) otherwise’ assuming detection independent of 


fading, namely, 
Ain = HEQjy. 


(7) 


Thus, the optimum receiver must determine 


N 
Ae = yee for all t= 2. oe WS 
k=1 


Ox 


and accept the signal s;(¢) which produces the largest. 


A,. A; can easily be formed by crosscorrelating y(#) with 
ye 1 [aix/oz]¢.(t), which are known to the receiver. Thus, 
our optimum receiver is just a simple crosscorrelator 
between received signals and signals available at the 
receiver. 


3 See [17] or [19]. 
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Detection Error, SEPARATION FuNcTION, 
EFFICIENT CODES 
Let P,; denote the error probability that signal  s;(¢) 
was accepted when signal s;(¢) had been sent excluding all 
the other signals. We shall call this error probability the 
detection error. This error can be expressed as 


t-ee)). 


xz 


e* dz (9) 


a 
where 


P(x) = =f 


which is the tabulated error integral, and 


ES N 
_ va; ~ Qin(Aix — A;x) 
co => a ; Uns — » v a 4 ; 
$leaa. gy 
7, = z 
k=1 o; 


The assumption of detection independent of fading 
results also in a symmetric operation of the system, 
namely, 


P;; = P;;. (ab) 


Since a symmetric channel with a given capacity has a 
smaller error probability then an asymmetric channel, 
with the same capacity, unless the capacity of the channel 
is very low [11],* or the choice of detection independent 
of fading results not only in a simple optimum receiver, 
but assures also a symmetric channel with its associated 
small error probability. 

The condition expressed by (7), which assures both 
fading independent detection and a symmetric channel, 
has the following physical interpretation. Any signal s;(¢) 
may be considered as consisting of a sum of orthogonal 
digits’ independently affected by noise. A single such digit 
may be represented by 


dip = dipPylt) 


Since noise affects the digit through its energy and is 
insensitive to the polarity, the digit d,;, and —d,, will be 
affected by noise in the same manner. Eq. (6) implies 
equal cost to the receiver of error in signal s,(¢) and 
s,(t), and this in turn signifies that every corresponding 
orthogonal digit of signal s,(f) and signal s;,(¢) is affected 
by noise in the same manner. 

Thus, for a system with fading independent detection, 
each signal of the set {s;(@)},7 = 1, 2, 3, --- , 2” must 
have the same number of orthogonal digits, u; for the 


(12) 


4'The proof in Silverman’s paper is for the binary channel, but 
the same proof is valid for nonbinary channels if the correlation 
among detection errors, P;;, is negligible. ; ; 

5 The term “orthogonal digit’? is used to differentiate from the 
usually used term binary digit. It is defined by (12). 
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same index of the orthogonal digit, k = u, the digits must 
be equal in magnitude and be the same in form but can 
be opposite in sign. For any two signals s,(¢) and s,(t) 
of the set, digits will be of opposite sign and (wu — A) 
digits will be of the same sign. The function £, will 
reduce to 


(13) 


where the summation is to be taken over those k& for 
which a;, = —j,. 

The larger £&,;, the smaller the detection error, P;;; 
therefore, it is logical to call &, the separation function. 
Any code having the separation function of the form (13) 
and consisting of signals formed from sums of orthogonal 


digits is called an efficient code. 


Meruops or CoDING AND THE ASSOCIATED 
ERRor PROBABILITY 


It is desirable to select the set of signals {s;(¢)} to 
minimize the error probability P,, subject to the average- 
power limitation of the transmitter. The latter can be 
expressed mathematically as 


T N 
| OW = > = Or (14) 

0 k=1 

LOPet = 2 ees 
In the simple case of one bit of information (m = 1) 
P, =P, = Pa =}|1~o(44)], (15) 
2 V2 

where #, = S’T/o?, s,(f) = —s,(t) = SV Te,(), and 


g(t) is the eigenfunction of the integral equation (1) 
corresponding to the lowest eigenvalue a7. 

If we attempt to develop a similar coding procedure 
for m > 1, we run into some difficulties. The exact ex- 
pression for P, cannot be found unless the set of signals 
{s;(t)} is given, and if we satisfy ourselves with minimi- 
zation of the pessimistic expression” 


PE Oe eee ee 
t=1 j7=1 


CFT 


(16) 


the problem is still too difficult to handle if one considers 
the functional form of P;;. 

Thus, we are forced to develop several efficient codes 
which are optimum in some sense and then compare 
their performance. 


The Minimax Code 


The minimax code is an efficient code which satisfies 
the following conditions: 


6 See [1] or [10]. 
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1) Every signal of the set {s;(€)} is composed of a 
minimum number of orthogonal digits. 

The waveforms of the orthogonal digits are the 
ordered eigenfunctions of (1). The eigenfunctions 
are selected in increasing order, starting from the 
lowest, until as many as necessary have been chosen. 
3) The largest detection error P;; is minimized by 

proper selection of the coefficients {a,,}. 


2 


Conditions 1) and 2) assure least interaction between signal 
and noise in the channel. Condition 3) assures that the 
signals are resistant to the most adverse action of the 
channel and gives rise to the name of this code, minimax, 
or the code which minimizes the maximum detection error. 
If m bits of information are to be transmitted, the 
minimum number of orthogonal digits in each signal is 
m, or uw = m. The maximum detection error will occur 
for any two signals s;(¢) and s,;(t) when they differ by one 
orthogonal digit. Since this error must be minimized 
subject to average-power limitation of the signal, it 
means that every digit of any signal s;(¢) must be affected 
by the noise in the same manner, or for any s;(¢) of the 
signal set {s;({)},2 = 1,2, --- , 2” the following conditions 

must be satisfied: 
Os Ses 


Gin 


and 
(18) 


Using (17), (18), and (12), one obtains the expression for 
the uth orthogonal digit of the 7th signal 


d;, = +4;,9,(0), (19) 


where 


a, = SVT (20) 


oO 
—— pie oUF ea ae re 
2 
ApoE 
k=1 


The set of signals {s,;(f)} can now be found from the 
matrix equation 


MrAmPm = A. (21) 


The matrix J/,, is formed by writing all 2” binary numbers 
in ascending order from zero to 2” — 1 and replacing 
every zero by —1. Obviously, the /7,, matrix isa 2” X m 
matrix. We shall call the /,, matrix the coding matrix 
of the minimax code. 

The A,, matrix is a square diagonal matrix with elements 
of it found from (20) using the following rule 


Ge. ORS TOs at Se 


ip 


=/G;, or 2 = ie (22) 
The ©,, matrix is a column matrix consisting of m. rows 
and the elements of the matrix are the eigenfunctions 
{y,(t)}, » = 1,2, --- , m, chosen to correspond to the 


eigenvalues a7. 
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The A matrix is a column matrix consisting of 
2” rows; the elements of the rows are the signals s,(é), 
je aor 

Using (13) and (20), the separation function for the 
minimax code reduces to 


f= & = ues for } = 1,2,- 3. ms, ee 
; 
k=1 
The error probability for the minimax code is 
Pe le ee (24) 


where P,_, is the detection error corresponding to the 


separation function of (23) for \ = 1. 


The Equal Separation Code 


Consider an efficient code with all the separation — 


functions £;; equal and the total separation 


at its maximum. A eode defined in this manner determines | 


a stationary point for the error probability.’ The equal 
separation code will possess the following properties. 


Le elas 


2) All separation functions £;; and all detection errors 


P,;; are equal. 


3) The total separation A,r is at its maximum. The | 
separation function for this code is of the form ~ 


Se 
fi; aes u 5) 
2 
OK 
k=1 


(26) 


(25) - 


— 


where \ and w are fixed and independent of 7 and j. 
Thus, to satisfy conditions 1), 2), and 3), vw and \ must — 


be the same for all signals of the set {s;(t)}; w must be 
as small as possible and ) as large as possible.® 

The set of signals {s;(¢)} can now be found from the 
matrix equation 


M,A,®, = A. (27) 


Matrices A, and ®, are of the same form as matrices | 


=e >} 


A,, and ®,,, only the index » varies up to wu and not m, 


namely, u = 1, 2, --- , wu. The matrix remains the same 
as for the minimax code. It now remains to form the 


coding matrix M,. This matrix must possess the following 


properties. 


1) It must be a 2” X wu matrix, 


2) The number of columns, u, must be a minimum, - 


and \ a maximum. 
3) Each element of the matrix must be either a plus 
or a minus one. 


‘The proof of this statement is identical with the proof given 
for equidistant codes in Appendix B of [1]. 
’ This will assure maximum separation among signals. 


4) Each row must differ from any other row by the 
same number of elements \ and have (wu — A) of 
the same elements.° 


he M; matrix is generated using Slepian’s [13] (2”—1, m) 
roup alphabet.'® The code is constructed by writing 
ch word number and digit number in binary notation 
“om 1 to 2" — 1. The first generator is then formed 
"om the unit digits of each digit number, the second 
enerator from the next place digits . . . etc. The other code 
yords are determined from the generators in conventional 
ashion by addition modulo two of the corresponding digits. 

Table I shows the generation of the code words when 
n= 3. 


TABLE I 
Digit Number 
0 0 0 1 1 1 1 third generator 
0 1 1 0 0 1 1 © second generator 
word number 1 OL OS 0 1) first senerator 
Om O iat: LO tO Ose firsticenerator 
Ope lO O 1 1 0 0 1 1 © second generator 
ub il i il @ @ # a @ 
i W) © 0 0 0 1 1 1 1 third generator 
i al ih QO: ali @. ak 
ik aly Oak at a ik OO 
il. eal tab (Oy Ab Se a al! 
QO @ O-O WO 


The matrix M, is now formed by taking the Slepian’s 
2” — 1, m) group code and replacing every zero with 
ninus one. The resulting matrix will be a 2” * (2” — 1) 
matrix in which every row differs from every other row in 
he matrix by 2”~* elements. Eq. (26) will now reduce to 


(28) 


[he corresponding expression for the error probability 
fis 


Were Gee lerees) is (29) 


ind P, is the detection error corresponding to the separa- 
ion function (28). 


The Equidistant Code 


Since the codes developed so far are only optimum in a 
ertain sense, it is advisable to investigate some other 
vossibilities of coding. One such possibility is an extension 
yf the theory of equidistant codes as developed by Chang, 
t al. [1], to combat additive white Gaussian noise in the 
s-hannel to the case of nonwhite Gaussian noise. Consider 
inary digital coding, namely, coding into pulse trains 
‘onsisting of identically shaped pulses both positive and 


® This property of the matrix will assure equal separation property 
»f the code. ; ne ttaa , 

10 See [1] or [16], code Type 10’. A detailed discussion is given in 
17] and [19]. 
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negative. Since P;; = P;; and the signal set {s,(é)} is 
average-power limited, each signal of the set must consist 
of the same number of binary digits, wv. It is obvious that 
if the decision time is fixed at some value 7 and if there 
are u digits in each signal, then the pulse shape for the 
kth digit of the 7th signal must be of the form 
dj, = —S Le, (30) 
‘ Nu 
where W,(t) is eigenfunction corresponding to the lowest 
eigenvalue 7; of the integral equation (1) with 7’ re- 
placed by T'/u. 

The coding matrix M, will consist of plus and minus 
ones; it will be a 2” & w matrix. The D matrix whose 
elements are the binary digits of the signal set {s;(¢)} 
may be expressed as 


D = av,(HMp, (31) 
where 
ae (32) 
Let us define a new function 
1 T 
Du = gam | b — s(OF at, (33) 


which is the distance between any two signals s;(¢) and 
s;(t) of the signal set {s;(¢)}. Let the 7th and jth rows of 
the matrix D differ by \ binary digits, then 


Diy 


S1> 


Consider now a system of coding for which all the distances 
D,; are equal and the total distance D7 as defined by 


DTT 2 


Da = Dy 


a=1 7=1 
tAG 


(34) 


is at its maximum. Such code determines a stationary 
point for the error probability.” The coding matrix Mp 
for the equidistant code will be identical with the coding 
matrix for the equal separation code, or \ = 2” and 
u = 2” — 1, and the M> matrix is a Slepian [2” — 1, m] 
group alphabet with every zero replaced by minus one. 

Using the same procedure as for orthogonal digit coding, 
the separation function of the equidistant code can be 
expressed as 


Ota 
rr) 


= (35) 


Se 
The detection error and the corresponding error prob- 
ability are found as for the equal separation code. 


Other Codes 


The three types of codes discussed so far do not exhaust 
the possible schemes of coding one can develop. For 
instance, one can develop nearly equidistant and nearly 
equal separation codes which possess certain advantages. 
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1) Nearly Equal Separation Codes: The coding matrix 
for these codes also is formed from Slepian group alpha- 
bets, but inequality in the distances between sequences is 
permitted, namely, 


— (36) 


Obtained in this way, signal set {s,(¢)} has some inequality 
in separation functions, but the total separation remains 
at a maximum value. These codes will require fewer 
orthogonal digits per bit of information transmitted, and 
their performance will be better than that of other codes 
for moderate information rates and moderate increase 
of noise power spectrum with frequency. 

2) Nearly Equidistant Codes: The nearly equidistant 
codes are an extension of the equidistant code theory in 
the same manner as the nearly equal separation codes are 
an extension of the equal separation code theory. The 
coding matrix for the nearly equidistant codes is the same 
as for corresponding equal separation codes. 

The nearly equidistant codes permit a reduction in the 
number of binary digits per bit with moderate increase 
in error probability P,, which results in a reduction in the 
equipment cost in return for a slight increase in the error 
probability if the noise power spectrum is white. 

If the noise power spectrum increases with frequency 
and the decision time remains constant, reduction in the 
number of binary digits per bit for the nearly equidistant 
codes may result in an actual decrease in the error prob- 
ability. 

The expression for the error probability for codes 1) and 
2) is of the form 


Pot =| — 


=i 


(37) 


where 2 is the lowest and j the highest weight of a row 
in the coding matrix, excluding the identity vector; y) 
is the number of rows in the coding matrix of weight 2. 
The constant y, must satisfy 


pai S 


For the nearly equal separation codes, the separation 
function is 


Ye es Oh (38) 


ST 


u S 

2 

Or, 
k=1 


Neti Laer (39) 


For the nearly equidistant codes, the separation function is 


The detection errors P, are found in both cases in the 
usual manner. 
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COMPARISON OF VARIOUS CODES 


One realizes at once that to minimize the error prob- 


functions must be made as large as possible; zero error 
probability can only be achieved either for 7 — © or 


be reduced to zero, which agrees with the theory developed 


by Shannon [12] and refined by Feinstein [4] and Elias [3]. : 


Let us now study the influence of change in S* and T 
for various codes and different power spectra of noise. 


White Gaussian Noise, ®,(jo) = A? 
In this case, (1) will reduce respectively to 
oig(t) = A’p,(E) 
for orthogonal-digit coding and 
TeV) = A*y, (0), 
for binary digit coding. Thus 


(41) 


(42) 


(43) 


for all k. Any orthonormal set on (0, 7’) or (0, 7’/u) will do. 
This permits an infinite choice of signal forms for coding | 
because one can use any orthonormal set in the interval 


(0, T) or (0, T’/u) to generate a particular code. The | 


choice of any specific orthonormal set will be governed 
mainly by the bandwidth requirements and ease of 
generation of the resulting signal set {s;(¢)}. 


From (48), (28), and (35), one can see that in the white , 


noise case the separation function will be the same for 
equidistant and equal separation codes, so that both 


| 
| 


| 


ability for any of the codes described above, the separation — 


S’ — o. For a finite decision time 7’ and finite average - 
power of the transmitter S’, the error probability cannot | 


=! 


} 


| 


4 
i 
| 
i 
| 
| 
] 
| 
q 


1 
X 


} 


codes are equally good in minimizing the error probability. — 


One would probably prefer to use the equidistant code 
because the detector for it is quite simple.” 


In this case, the error probability for the minimax code | 


is the same as for the binary uncoded case. From the 


| 


form of the separation functions, it is obvious that for | 


infinite bandwidth, increase in average power S” and 
decision time 7 are equally successful in decreasing the 
error probability for all codes. 


Gaussian Noise with the Power Spectrum of the Form 
&,(jw) = A? + Bw’ 


The eigenvalues for this type of noise are” 


Ae 

el ee (44) 
a Be 
i = loreal (45) 


A simple detector for the equidistant code can be found in 
[1], p. 22; a modified version of it is shown in [18] and [19]. 
12 See Appendix. 
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one examines the expressions for the separation functions 
id (44) and (45), one can see that increase in decision 
me 7’ is much more ‘effective than increase in average 
ywer of the transmitter S” in increasing the value of the 
paration function for all codes, thus decreasing the 
obability of error P,. 

Fig. 1 and 2 show the behavior of P, as a function of 
for various codes and fixed values of S’, ®,(jw), and m.”* 
ne can conclude from these figures that for high informa- 
on rate and noise power spectrum that increases rapidly 
ith increasing frequency, the minimax code performs 
st. On the other hand, for low information rate and noise 
ywer spectrum that increases slowly with increasing 
equency, the equal separation code performs best. 
he equidistant code is always poorer than the equal 
paration code. This can be verified mathematically 
y means of (28) and (85). Thus, 


2 2 2 

UT; 
a= HO I (46) 
D tg: OK 

k=1 


he equality sign holds only for white Gaussian noise 
3” = 0). Using (44) and (45) 


2 
T1 


ee 1 (47) 
ad then from (46) and (47), it follows that 
Es > & (48) 


emonstrating that the equal separation code has a 
rger separation function (lower error probability) than 
1e equidistant code if the conditions for transmission 
re the same in both cases. 

If B’ and m are small and T is large, the performance of 
1e equidistant code approaches that of the equal separa- 
on code. Since the detector for the equidistant code is 
uite simple, it may be preferable to use the equidistant 
yde for low information rate and noise power spectrum 
1at increases slowly with increasing frequency. 


INCREASE IN THE ERROR PROBABILITY P, 
Due To FINITE BANDWITH OF THE SYSTEM 


Up to this point, it has been assumed that the bandwidth 

f the system is infinite. In an actual system, every signal 
(t) of the set {s,(¢)} will not only be corrupted by addi- 
ve Gaussian noise but also distorted because the band- 
idth of the system is finite. This will give rise to an 
crease in the error probability due to interdigit cross- 
lk. The actual received signal, excluding the effect of 
oise, is of the form 


z;(t) = S bingx(t), (49) 


here (Dep. AF Qik 


13 Note that the information rate is m/T’. 
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ERROR PROBABILITY, Pe 


NORMALIZED 


DECISION TIME, T 


Fig. 1—Error probability as a function of normalized decision time. 
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Fig. 2—Error probability as a function of normalized decision time. 


KO 


If the bandwidth of the system is much larger than the 
bandwidth of the signal, b;, = a,,, and the rise in error 
probability is negligible. In general, when 6;, differs 
appreciably from a;,, the symmetry of the system will be 
disturbed and the error probability will increase. The 
actual error probability can easily be found by using 
b;, instead of a;, throughout the analysis. 

The increase in error probability can be partially reduced 
if one uses at the detector signals composed of orthogonal 
digits with b;, replacing a,, for orthogonal digit coding 
and a pulse of the form S~W/7/u f,(é), where f,(¢) corre- 
sponds to the distorted form of ¥,(¢) for 0 < ¢t < T/u. 
The distortion is caused by the finite bandwidth of the 
system. 


APPENDIX” 


EIGENVALUES AND HIGENFUNCTIONS OF THE 
INTEGRAL HQUATION 


T 


Eo. = [ gla)K(x — 0) de, 


where 
K(x) = §"[A? 4+ Bo’). 
Since the power spectrum ©®,(jw) = A® + B’w” is given, 


the inverse Fourier transform of it, K (a), which forms the 
kernel of the integral equation, may be found, namely, 


vile o 
Das Mes 


= A” sz) — B’ 6’), 


K(@) = [A? + B’w"Je’°? dw (50) 


where 6(z) is the Dirac delta function and 6’’(2) its second 


derivative. Substituting from (50) into the integral 
equation, we obtain 
» doll 5 : 
Be oll) + (co, — A)o,(t) = 0. (51) 


dt 


The set of functions {¢,(t)} which satisfies the differential 
equation (51) and the conditions 


[ elie (dh een, (52) 


where 6;; is the Kronecker delta function, and 


C= 0 Biome de=.0' eoree hs (53) 


is the set of eigenfunctions used for coding and the corre- 
sponding o; are the eigenvalues. 

The fact that the set {g,(¢)} is a solution to (51) assures 
that it is a solution of the integral equation. The condition 
expressed by (52) assures that the set is orthonormal, 


14The procedure in this Appendix is essentially the same as in 
Appendix 2 of [2]. It can be justified using the theory of distributions 
in [20]-[23]; namely, the set of functions {¢gz(t)} generated by (51) 
belongs to the class D (see [20]) and K(x) is a symbolic function 
(see [21]). 
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and the condition expressed by (53) will keep the switchin 
impulses to a minimum, thus decreasing the unwante 
impulsive noise. 

Let us rewrite (51) in the form 


dy,(t | 
fe) + aie(t) = 0 (54) 
‘| 
where 
7 — A? | 
a, = = B: 39) 
The solutions to (54) are of the form | 
g(t) = K, cos a,t + K, sin a,t. (56) | 
e 


eae Ne | 
For ¢,(t) to generate a set {¢,(t)} satisfying the conditions | 
expressed by (52) and (53), all a; must be positive and 
real, or (55) must be restricted by the inequality . 
| 


(57) 


a; an 
To satisfy the condition expressed by (53), 


_ kn | 
Oh, = T ; | 
and K, = 0. (58) 


The corresponding eigenfunctions are 


oh) = ae sin bi tora = -,N. (59), 


Using (58) and (55), one can find the eigenvalues 7 corre-- 
sponding to the eigenfunctions given by (59), namely, 

k?RB? 2 : | 
pees a aay (60). 


Let T now be reduced to T'/u, and let the new eigenvalues 
and eigenfunctions be 7; and W,(t), respectively; then 
(60) will transform into 
(61) 


| 


A Pi py) © 
2 ku Boar 
Oia rp 


hele 


and (59) into 


20 kun 
v(t) = fe sin 7! for -b=enl aoe ee 
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Some Results on the Problem of 
Discriminating Between Two Gaussian Processes” 


P. BELLO}, ASSOCIATE MEMBER, IRE 


Summary—tThis paper is concerned with certain aspects of the 
problem of discriminating between two Gaussian processes. The 
emphasis is on determining approximate optimum detector structures 
which avoid some of the mathematical difficulties inherent in the 
evaluation of the exact optimum detector structure. To this end, an 
approach termed the ‘‘inverse operator’? approach is presented 
which leads to approximate detector structures via the Neumann 
series expansion of linear operator theory. These approximate 
detectors are found by using a finite number of terms in an ‘‘optimun 
detector” expansion which results from the use of the above 
Neumann series expansion. A sufficient condition for the rapid 
convergence of the optimum detector expansion is found to be 
that the eigenvalues of a certain operator have magnitudes much 
less than unity. An upper bound is derived for the error incurred 
at the detector output by the use of a finite number of terms in 
the optimum detector expansion. Error probabilities are calculated 
for the case in which the detector outputs may be assumed approx- 
imately normally distributed. From an output ‘‘signal-to-noise”’ 
ratio point of view, it is shown that the performance of the optimum 
detector and approximate detectors will differ negligibly if the above 
eigenvalues have squared magnitudes much less than unity. Some 
upper bounds are derived for the largest eigenvalue (magnitude). 


I. INTRODUCTION 


HE problem of detecting normal signals in normal 
“lf eaaitie noise has been studied by Price [1]-[4], 
Middleton [5|-{7], Turin [8], [9], and Kailath 
[10], [11]. Price’s [1]-[3], Turin’s, and Kailath’s work 
pursue the question of scatter-path communications and 
are concerned with an optimum probability computing 
receiver for N’ary signals. While Middleton studies only 
the binary situation, he considers [5], [6] a general decision 
theory formulation of the problem of detecting the 
presence or absence of a normally distributed process in 
background normal noise. More recently [7], Middleton 
has been concerned with the somewhat more general 
problem of discriminating between two Gaussian processes. 
While the results of the above analyses are of great 
interest, they are generally not in a form which lends 
itself readily to application. The reason for this is that 
in the case of discrete sampling, one must invert matrices 
of high order, while in the case of continuous observation, 
one must find the solution to certain integral equations 
for which the method of solution is usually not known. 
The purpose of the present paper is threefold: 


1) To present, in the case of continuous observation, a 
derivation of the optimum detector structure for 
discriminating between two Gaussian processes, 


* Received by the PGIT, September 20, 1960. 
+ Aleom Inec., Cambridge, Mass. Formerly Appl. Res. Lab., 
Sylvania Electronic Systems, Waltham, Mass. 
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termed the ‘inverse operator’ approach, which 
is an alternate approach to Middleton’s [7]. 

2) To obtain, via the Neumann series expansion and the 
inverse operator approach, approximations to the 
optimum detector (valid in certain practically 
meaningful situations) which require the inversion - 
of at most one linear operator (and sometimes none). | 

3) To examine some of the approximate detector | 
structures in 2). ‘ 


| 
The derivation is carried through for complex-valued 
Gaussian processes, thereby making the results applicable 
to a wide class of nonstationary Gaussian narrow-band ’ 
processes (and, of course, to narrow-band stationary | 
processes in general). 


Il. Tue INVERSE OpEeRATOR APPROACH 


The statistical problem we are concerned with is the; 
following: Given a finite record of a complex-valued 
normally distributed process Z(t), find an optimum test 
to determine whether this process has covariance K(E, s) | 
or L(t, s) ( assuming it is known a priori that only these 
two covariances are possible). Because of the continuum 
of values present in Z(t), it is necessary in forming prob- 
ability density functions to deal, at least initially, with 
some representation which involves only a finite number of | 
random variables. Two representations have been used — 
in the past: 


1) Representation by discrete samples of Z(t); see 
Middleton [5], [7] and Priec [1]-{3]. | 

2) Representation by finite orthogonal series; see 
Grenander [12] and Davenport and Root [13]. 


The method of approach is to solve the statistical 
problem with the finite variable approximation, and then 
allow the approximation to become arbitrarily fine. Thus, 
in the discrete sample approximation, the number of | 
samples in the finite record of Z(t) would be allowed to 
increase indefinitely, while in the orthogonal representa- 
tion of Z(t) arbitrarily high-order terms would be included 
in the expansion of Z(t). We will use the latter method 
of representation to derive the inverse operator approach. 

It is presumed that the reader is familiar with the 
Karhunen-Loéve orthonormal series expansion of a random 
process.’ Briefly, if Z(t) has a continuous covariance 
function K(t, s) defined as 


K(t, s) = Z*()Z(8s) = K*G, 2), (1) 
1 See [13], p. 96. 


IG 


len it may be represented by the series 


2) = Do t.6(), (2) 


here the ¢, are uncorrelated random variables given by 


t= iL Zeenat sahere i evar VG 
Jt 
od ¢,(t) is the nth eigenfunction of the homogeneous 
redholm integral equation, 
| KG, 04.6) ds = 6.0 (d 
iP 
1 which X, is the nth eigenvalue and ¢,(¢) is orthogonal 
> ¢,(t) for 7 # k. The subscript 7 denotes an interval 
f integration of duration 7. It should be noted that 
nee we assume that Z(t) is normally distributed, ¢, 
; also normally distributed. In this case, ¢; is not only 
ncorrelated, but also independent of ¢, for 7 # k. It is 
sadily found by direct evaluation from (3) and use of 
4) that 


PEP =o) 0: (5) 


é., that the variance of the nth coefficient is just the nth 
igenvalue of (4). The joint probability density function 
f the first N coefficients is 


Vo, DEE te (2) a exp => | . Ei} (6) 
eee 1 n 


1 


If the covariance function of Z(t) were Lit, s), then an 
ntirely analogous development would follow where now 
re have eigenfunctions y,(t), eigenvalues u,, and coeffici- 
ots 7,. The joint probability density function of the first 
| coefficients would then be 

| Saueie F| 
CxO eee ena) 
1 ML 


nm 


1 N 
An, N2, sh a nw) ae (4) 


Tr 


N 
IT», 


For the subsequent development, it will be necessary 
) represent the summation in the argument of the ex- 
onents in (6) and (7) in an integral form, 


an 
z* 
an 
3 


ae 2 . . 
ete i Z(b)\o*(t) dt = i ZL) VEQ) dt B 


3 
3%* 
3 
3 


we io gt ee Ii 
‘s 


it Z(t) X%*(4) di 


a3 
3 


Y x(s) = De @ ae is 


Kel) = = ag 
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If both sides of the first equation in (9) are multiplied 
by K(s, t) and integrated with respect to s over the interval 
T, and if both sides of the second equation are multiplied 
by L(s, ) and integrated in the same way, it is found that 
Yy and Xy are solutions of the following Fredholm 
integral equations: 


il Key @ dG eee 
3 (10) 
| MCR AMOUR = op bei 
in which 
Z(t) = Do bnbu(d) 
: (11) 


Li Do nnWnll) ; 


1.e., Z(t) is the first N terms in the orthonormal expan- 
sion of Z(£) with basis functions ¢,(¢), while Z%(¢) is the 
first N terms in the orthonormal expansion of Z(¢) with 
basis functions y,(t). Eqs. (10) can be written in linear 
operator form as 


Zig 
he 


IK Woe = 
xe = 


(12) 


where the K and L operators are readily identified in (10). 
When the inverse operators exist, they are linear, and 
provide solutions for Yy and Xy by the formal operations” 
aa 
LPL. 


Vu = 
Ge = 


(13) 


In general, the inverse of an integral operator (such as K 
and L are, usually) is not an integral operator. However, 
if one permits the use of delta functions and their deriva- 
tives, one may frequently be able to express the operations 
K~* and L™ in the formalism of integral operators. In 
such cases, one may write 


Yeas iL K-Xs, )Zi(s) ds; teT 


(14) 
Kyi i. L“\(s, )Z%(s) ds; bed 


where K~*(s, t) and L™‘(s, t) are kernels which may 
contain delta functions and derivatives of delta functions. 
Formally, then, we may rewrite (6) and (7) as 


2 By K-! we mean the linear operator which provides a solution 
to the equation Ky = z, where y and z are suitably well-behaved 
functions. 
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: 1 \iinet 


(0s 
1 


-exp fi [ Z(t)Z%*(s)K~*(t, s) di as| (15) 


ieee 
W.(n, N25 sas nx) = a 


TT 


I « 


“exp fet [ 2M 28'@E7U, 9) at as| 
where the readily deduced symmetry properties 
K"(s, t) = (K"@, 8))* 
Est) — (ee) 


(16) 


have been used. 

As is well known, an optimum rule for categorizing a 
set of samples as coming from one of two possible popula- 
tions is the Neyman Person Rule. This involves a com- 
putation of the ratio of the two possible density functions 
(the likelihood function) when the actual sample values 
are inserted into the argument of the density functions, 
and a comparison of this ratio with respect to a threshold. 
Other optimum rules are known [5]. However, all perform 
the same operation, although with different thresholds. 
In the case of normal density functions, it is more con- 
venient (and just as valid) to compare the logarithm of 
the likelihood ratio with a threshold. Examination of 
(15) indicates that if N coefficients in the orthonormal 
expansion of Z(t) are used, the optimum test involves a 
computation of 


rs i) | Z(t) {ZK UL, 8) 
— Z**(s)L"‘(t, s)} dtds (17) 


followed by a comparison with a threshold. Now we 
desire to let N increase without bound so as to have a 
complete characterization of the observed record. If the 
Singular case* is not to occur (where it is possible to 
achieve a perfect test with a finite record of data), then 
@y must remain finite as N — o. Moreover, note that 


lim Z4() = ZG) 
= (18) 
lim Zi(s) = Zs). 


No 


Thus in the Regular (or Nonsingular) case, the optimum 
detector involves the computation of 


=| [x (QZ*O) (KE = TG a) ai is ed 
which is the desired result of this section. 


3’ The most recent discussion of the Singular case may be found 
in [14]. 


It should be noted that although the derivation has | 
involved the use of eigenfunctions and eigenvalues, the 
final test (evaluation of Q) involves only a determination 
of the difference of the inverse kernels followed by a 
double integration involving the observed data. 

One might be tempted to examine (14) and take the | 
limit as N > o, arriving at | 


= [| KG, 926) ds; te 
UT (20) 


XG) it L“*(s, t)Z(s) ds; feel 


However, this may not be permissible since, as will sub-_ 
sequently be shown, both Y(¢) and X(t) are of necessity 
unbounded if the test is to be Regular. What is permissible i 
is to define the difference | 

| 


W(t) = lim [Yy(4) — Xy(0)] 


= / Z(s){K~'(s, t) — L~(s, t)} ds (21) | 


in terms of which 


Z it Z(t)W*(1) dt < ©. (22) | 


The unbounded nature of X(t) and Y(é) is deduced | | 


from the fact that the integrals i 


| 
8 


i Z(t) Y*(t) dt = 
“T (23) 


| 
| 


I 
8 


i Z(t)X*(t) di 


with probability 1 when Z(t) is a complex-valued normally | 
distributed random process with covariance K(t, s) or | 
Lit, s) and (22) is satisfied. To prove these integrals | 
unbounded, assume that Z(t) has covariance K(f, s). 
From (8) we have 


N 
fn ae 7 yee. ADE at 


N-© 1 n No 
= [ 2oY*)at 4 
¢T 


when the limit exists. Using (5), we find that the mean 
value of the sum in (24) equals N. It follows that the 
infinite sum of mean values does not converge. But, 
according to Kolmogoroy [15], a necessary condition for the | 
convergence of a sum of independent random variables 
is that the sum of the means converge. Thus we conclude ~ 
that the first integral in (23) is unbounded. The proof . 
that the second integral in (23) also must be unbounded 
results trivially from (22). An entirely analogous argu- ! 
ment leads to the same results when it is assumed that 
the covariance of the observed process is L(t, s). 

In view of the above, it is likely that although the | 
inverse operators K~* and L~* may individually behave 


ee 
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uite radically, their difference will exhibit a much 
moother behavior for a Regular test. 


III. Expansion or THE Optimum DETECTOR 


The inversion of K or L is a difficult task, in general. 
‘hus it appears advisable to consider techniques which 
lleviate the difficulties of this inversion process. In this 
ection we consider expansions of the optimum detector 
1 a series of quadratic forms which, when applicable, 
ivolve the inversion of at most one linear operator. It is 
lear that such an approach will be practically useful at 
he present time only if the linear operator to be inverted 
; in the class of operators for which techniques of in- 
ersion are available. In the discussion to follow, it will 
e seen that the operator to be inverted is one of the 
dllowing: K, L, K + L, or N (where N is the operator 
Those kernel is the covariance of an assumed additive 
olse). 


|. Neumann Series Expansion 


Approximately optimum detector structures may be 
ound when it is possible to express the operation of the 
ptimum detector in a rapidly convergent series expansion 
f suboptimum operations. In this case, the first term or 
) will suffice to represent the actual optimum detector 
trructure. The series expansions of the optimum detector 
onsidered here depend upon the use of a linear operator 
xpansion of the type 


Pe =1 MEM KM .:. 
= (1M 5) 


hich is called a Neumann Series expansion. It is well 
nown* that a sufficient condition for the convergence’ 
f this expansion is that 


(Mx, Mz) 


LT ets Max as el (26) 
here the scalar product definition 
(9 = | Haro at (27) 


as been used and || M || is defined as the norm of the 
‘ansformation WM. Since® 


(Ma Ma)\r=a(Mr Ma), (28) 


4 For example, see [16], ch. IV, sect. 67. This expansion is also 
led the Liouville-Neumann expansion. The operator M* is equal 
) the operator M applied s times in succession, and M° = I, the 
lentity operator. 

5 The convergence implied is that the integrated (over 1) 


yuared magnitude of the difference between [M + I] zx and 
‘N,,-0 (—1)"M"x approaches zero as N — o (limit in the mean 
efinition ). 


6 The adjoint operator 1/* may be defined as that operator 
hich satisfies 
(UWP, ae) = (@, We), 
M were an integral operator with kernel M(t, s), then the adjoint 
smnel would be M*(t, s) where the asterisk here has the usual 
ymplex conjugate interpretation. 
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D7; 
it follows that 


(M*M«, x). 
(x, 2) 

It is well known’ that for a symmetric® matrix A, station- 

ary values of the ratio (Az, 2)/(a, x) are eigenvalues of 


|| 1¢ ||? = Max (29) 


At = Ne; (30) 
Thus 
Max oe = Maxe oN: (31) 


where {| A4 |} is the set of absolute values of the eigen- 
values of (80). 

Since M*M is both symmetric and positive,’ a sufficient 
condition for the convergence of the Neumann series 
(25) is that 


Idan Meese le (32) 


When M is symmetric, M*M = M”’, and Max {Ayu}: = 
Max {)j;}. In this case, the inequality in (82) may be 
replaced by 


Max (Nae) <b (33) 


B. Detector Expansions 


One type of expansion of the optimum detector to be 
discussed depends upon the operators K and L being 
‘close’ to one another and requires the inversion of either 
K or L. Let us suppose that the inverse of K, K~*, may 
be determined. Let the difference operator D be defined as 


Di TI (34) 


Then, providing the inverse of K exists,’° we may express 
L™* in the form 


7 See [16], ch. VI, sect. 93, theorem on p. 232. This theorem 
was pointed out to the author by Dr. T. Kailath, M.I.T. Res. 
Lab. of Electronics. 

8 A symmetric operator M is one which equals its adjoint M%*, 
4.¢., M = M*. 

9 A linear operator A is positive if (Az, x) > O and positive 
definite if (Az, xz) > 0 for x # 0. In general, the operators K and L 
are only positive and symmetric. When the original noise processes 
to be discriminated contain additive white noise, K and L become 
positive definite. In fact, if the spectral intensity of this white 
noise is No, then (Ka, 2) and (La, x) are > No(x, x) > 0. 

10 One may rigorously justify the factorization shown in (35) 
when K~ is a bounded linear operator, 7.e., when it has a finite 
norm. Now (cf. [16], sect. 104), A will be bounded if and only if 
K is positive definite. Thus the validity of the factorizations in (35) 
may be open to question if K is merely positive. A possible way to 
proceed in the case that K is only positive is to add ef to the 
operators K and L, where e is a small positive quantity. In this 
way the new operators K’ = K + ef and L’ = L + el become 
positive definite. The e may be carried through in the subsequent 
expansions and allowed to approach 0 at the end. Of course, one 
must establish that the conditions required for the convergence 
of the series expansions of the optimum detector are not violated 
as « > 0. Actually, for Regular tests (the only ones considered here), 
it is doubtful that one will need to let « — 0, since the addition 
of ef to K and L is equivalent to adding white noise of spectral 
intensity e« to our original processes. If the test is Regular to start 
with, the addition of a small amount of white noise cannot make 
the test Singular, and, on the basis of physical reasoning, cannot 
change the error probabilities of the optimum detector significantly. 
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= [K + D) = (7 + DK“)K}"* 
= K"[J + DK)" (35) 
According to Section III-A, if 
EY 0 ceca Miteeaen 
we may form the expansion 
[+ DK“)? =I -— DK? 4+ (DK’Yy 
=I+ >°(—DK")' (86) 
which leads to 
KL = KA DK = (OK 
= Ko De 1) De) eo) 


pail 


The operation of the optimum detector is thus ex- 
pressible as a sum of quadratic form as follows: 


Q = (KK — Lz, 2) 


OG SONG Oe SONG WL 
-> CSU MOIR, P -> (—1)'"Q; (88) 
where the quadratic form Q; is given by 
Ope WOK Ne (39) 
If LZ is invertible, an analogous development yields 
= ee Die ee (40) 
7=1 
where 
Hist (152) ma cma) (41) 
An expansion involving the inversion of the sum 
operator 
=K+L (42) 
is made by noting that 
KY = LL = 208 — D)* — 208 Dy 
= 28{7 — DS“) * — J+ DS")"} (43) 


and using the appropriate Neumann expansions. Because 
of the symmetry in the expression in (43), we have the 
expansion 


© 


1 L (D 


in which only odd powers of DS~* are present. In this 
case, the optimum detector has the expansion (neglecting 
the irrelevant factor of 4) 


= dX Vas j 


ai So ge TS seal eo lees) 


odd (45) 


INFORMATION THEORY 
where 
(46) 


V; = (S [DS] 22): 


When additive noise is present in the two process to be 
discriminated, we may express K and L as 


L 
IK 


Apia, 
| Bion IAI 


I 


(47) 


Here, N is the linear integral operator whose kernel 
is the covariance of the additive noise, and A, B are the 


linear integral operators corresponding to the covariance | 

of the processes that it is desired to discriminate between. 
-1 | 

and. Kk 


If N~* may be obtained, we may expand L™* 
as follows: 


October 


| 


q 


| 
i 


Lo = NAL ANS ee 
7k ges can-y(— | | 
ire (48) 
KK? = NA aa BN ae 
- nol 1 ae an-y(—1) | 
provided 
cA SNe k (49) | 
RSTn. ican 
In this case, the optimum detector has an expansion of 
the form 
Q = ye (Qa; ae Qz,](—1)""" (50) | 
where 
= (NAN eee) (51) 


OF, = (NW IBN {22 


When it is desired to discriminate between the presence | 
and absence of a process in additive noise, one of the pair 


A, B equals zero. Assuming B = 0, it follows that Q,, = 0, | 


all 7, and 
Q= 2 (-1' WAN 2, 2) = Quis 
When the additive noise is white | 
2 Pt op le ; 
Nowe ec VN. (53) 


and use of (50) or (52) then requires no operator inversions. 
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IV. Conpirions ror Rapip CONVERGENCE 
oF Detector EXPANSION 


Tn this section we will be concerned with establishing 
ufficient conditions for the rapid convergence of the 
equence of quadratic forms involved in the optimum 
letector expansion of the previous section. Consider 
ist the expansion (38). We may express the ratio of an 
dd term (term with odd subscript) to the previous even 
erm (term with even subscript) as 


Pe DK- 1 52) 
Q2i ES DK i252) 
a CAC SDR SDE 2) (54) 
GDR (DK 22" 2) 
f we define the operator 
an DG Kea DD) Ke PAGS) 
hen the ratio in (54) may be expressed as 
Qoi+1 — (2; DP 2,2) a (DP 2, P 2) (Dy, Yi) (56) 
Oye 2, 2) USP, Pal) (Ky, gy) 
vhere the function y; is defined as 
y; = Pz. (57) 


By a simple application of the calculus of variations, 
me may determine that stationary values of the ratio” 


(Dy, y) 
(Ky, y) 


vhere D, K are symmetric, are equal to the eigenvalues 
12 
yf 


(58) 


Diy = ON ie (59) 


ince D and K are symmetric, the eigenvalues are real. 
{ K is positive (as it is here), the maximum value of the 
magnitude of the ratio in (58) will be a stationary value 
nd thus equal to the absolute value of the eigenvalue of 
59) with the largest magnitude. Note, however, that we 
nay assume this absolute value is finite only if K is 
ositive definite [since then (Ky, y) has a positive greatest 
ower bound]. Since K is invertible (by hypothesis), we 
ave an equivalent eigenvalue problem” 


(1K Dp SON (60) 


1 Tn the analogous matrix problem, we have Courant’s Theorem 
Ole, Ds XO Ot (IAD ; ; 

122 Wor some discussion of the analogous matrix eigenvalue 
quation, see [18], p. 74. See also problems (78), (80), (83), and 
85) on pp. 115 and 116. : fh ge 

13 As already mentioned,!® when K is positive, but not positive 
efinite, K— will be an unbounded linear operator. This does not 
revent K-1D from being a bounded linear operator. Strictly 
peaking, one may rigorously justify the equivalence of the eigen- 
alue problems (59) and (60) only if K is positive definite. When 
. is positive, we may invoke the e argument}? to justify our manip- 
lations. 
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where K~*D is a nonsymmetric operator. If we denote 
the eigenvalue with the largest magnitude as p, then 


| Qoj+1 | e | p | Q>;- (61) 


Note that Q.; is non-negative, while Q,;., can be positive or 
negative.’* It follows from (61) and the fact that equality 
may be obtained in (61) when y; is the eigenfunction of (59) 
corresponding to the eigenvalue p, that | p | < 1 is both 
a necessary and sufficient condition for convergence of the 
detector expansion (88). This latter statement presumes 
that Q, and Q, are finite. Noting that Q., is positive, we 
deduce from (6) that 


Qoi+s = €5 | p ls Qo; (62) 


where —1 < e, < 1 for s odd and 0 < e, < 1 for s even. 
It then follows that (88) may be expressed as 


Q= do (a @,. r= Qo; (63) 
where the error coefficient ¢ is bounded by 
|e | |p | 
Se ee 64 
I—leP ** t/a! es 


One may conclude that rapid convergence will be 
assured if 


pone (65) 


and this being the case, the magnitude of the error in 
using the first two terms is bounded by 
(66) 


jee Senos 


1 alee 
It should be noted that (61) yields an upper bound on 
a term Q, relative to a preceding even-order term.’” Thus, 
as far as the present development is concerned, we must 
include at least the first two terms, in a discussion of 
error bounds. 
Exactly analogous statements may be made with 
regard to the expansion (40) where the appropriate 
eigenvalue equation is 


De ONION (67) 


and the symbol F’ replaces Q on the right side of (63) 
and (66). 


14 A quadratic form G; with a generic expression 


G; = (Eas Tall 23 2) 


in which 7;, 7: are symmetric may be expressed as 
G. = (Cae fi); j odd 
7 


(Tih;, h;); j 


where f; = T1[ToTs).7 4/7 2 and h; = [12T,|'? z. Thus, the term 
G; will be non-negative for odd 7, if 7: is positive and non-negative 
for even 7, if 7; is positive. 

15 This restriction could be removed if D were positive definite. 
Then one would be able to show that Q;.. < | o | Q@; for 7 odd or even. 


even 


The situation is somewhat different in the expansion 
in (45) where we may express 


OS year. (68) 
ae 
in which 
pee (69) 
Ls | Pp | 
In (69), p is the eigenvalue of 
Dz = XSz or .Ss* D2 = 2 (70) 


with the largest magnitude. Since even-order terms 
are absent in the expansion (45), we may obtain 
|e| V2 & |p| Vz for | p |}«1asa bound on the magnitude 
of the error cones by the use of only the first term. 
The expansions of (50) and (52) differ from the previous 
ones in that the operators A and B in the quadratic 
forms Q4, and Q,, (51) are positive, while the operator 
D in Q;, F;, and V; is not. As a result, Q,, and Qz, are 
non-negative for all 7. Moreover, it may be shown by 
arguments similar to those leading up to (61) that 


Qin: < pPcQe;} C = sel B (71) 
where pc¢ is the largest eigenvalue of 
Ca Neon NG ne (72) 


Since A, B, and N are positive, all eigenvalues of (72) 
are positive. In view of (71), the expansion (50) may be 
expressed as 


Q = 2 A= Olay 
se lesQu, — ez, \(—1) (73) 
where 
po - es POL 2 i 7 
ee ee erg ea oe C.=. A B: (74) 


It is clear that rapid convergence of (73) will occur when 
pa and pz, are individually much less than unity (assuming 
of course that Q4, and Q,, are bounded). 


V. Error PROBABILITIES FOR 
LARGE SAMPLE SIZE 


Strictly speaking, a suboptimum detector may be said 
to adequately represent an optimum detector when the 
error probabilities of the associated hypothesis testing 
problem satisfactorily approximate those of the optimum 
detector. Thus a rigorous justification for using the first 
m terms of a series expansion of the optimum detector 
would be a demonstration that the resulting error prob- 
abilities satisfactorily approximate those of the optimum 
detector. Such a demonstration appears to be quite 
difficult except in the case where the observation interval 
T is sufficiently large. For this latter case, the detector 
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output probability distributions may be expected to be 
approximately normally distributed.’ In this section, 


October 


we will assume the normal assumption to be sufficiently 
accurate. When the first term or so in the previous detector 


expansions does accurately approximate the optimum de- 


tector, one may expect the normal approximation to be 
valid if low error probabilities are to be obtained. This 
expectation follows from the fact that a rapidly con- 


vergent detector expansion implies that the two processes” 


to be discriminated are ‘close’ to one another, and thus 


a long observation time is required to discriminate 


between them. 


| 


On the assumption that a quadratic form is normally | 
distributed, only its mean and variance need be computed — 


to sive its probability density function. 
cal averages” 

(Mz, 2) = Tr (ME) 
= Tr (MW8.) Tr [NR i Ne 


ae (75) 
(Mz, z)(Nz, 2) 


where the overline denotes an ensemble average, R is an_ 
integral operator whose kernel is the covariance of the z 


process, 7, N are symmetrical operators, and 


a3 iL P(t, t) dt 


is the trace of the integral operator whose kernel is P(é, s). 
In order to keep the following discussion as uncluttered — 


as possible, we will compare the performance of the 
optimum and suboptimum detector when the threshold | 


In the 
discussion uu LO we will need the following two statisti- | 


| 


iH 


| 


levels of the respective detectors are individually adjusted 
such that for each test the two types of error probabilities — 
are equal. (This would correspond to binary symmetric | 


operation in a communication system.) Then one may — 


readily determine that this common error probability 
for a test (assuming normal statistics prior to thresholding) — 


is given by 


where m,.. and o;, denote the means and variances, 
respectively, of the detector output (prior to thresholt a 
for the two possible hypotheses, and 


i SY) 
== Che. 
V/ er 


In the following discussion we will consider only the - 


(77) 


(78) 


expansion (38). The other expansions presented in Section — 


III may be handled in an analogous fashion. Let Qe, 


16 See [6], sect. 17.2-1, p. 234, and [19], sect. (3), p. 326. 


17 For a complex normally distributed process (with zero mean 
| 


one may demonstrate that 
2(t1)2*(te)e(ts)2*(t4) = 2(ty)2*(te)- 


)-2(ts)e*(ta) + 2(ti)e*(ts) -2(ts)2*(t). 


Note that for a fomplex normal process with zero mean 2(tq)2(t3) = — 


0. See [20], p. 72. 


IG 1 


r denote the sum of the first n terms in (38) for the 
ypotheses that the input process has covariance K(t, s) 
* Lt, s), respectively. When n = ©, we arrive at the 
otimum detector outputs for the two hypotheses. These 
ill be denoted by Qx and Q,. With the aid of (75), one 
ay readily determine that 


a 1) Tr Ps, 

Of Sab Pae aly anes 
a St) Tr Py 
par Tr) on Tr P| On re 


here P = K'D and Var X denotes the variance of X. 
se of (79) in (77) allows the determination of the error 
robability a, associated with a suboptimum detector 
hich uses the first m terms of (38). We will make a 
etailed comparison between a, the error probability of 
1e optimum detector, and a, the error probability of 
1e suboptimum detector which uses only the first term 
* (38), 2.¢., mn = 1. From (77) and (79), these error 
robabilities are’® 


a ce P| | 
a Ere, Trae oie eee 


= 
| 


D (Hn! tr PY 


ie [Per ‘x Ger rP ay 


l= (80) 


Now 


res) =, (81) 
here {\,;} is the set of eigenvalues of P. From (81) it is 
sadily shown that 
Tr [. av te és 
Tipe. |p | 
here 0 < «, < 1 for s even and —1 < e, < 1 for s odd, 
nd p is the eigenvalue of P = K~'D that has the largest 
agnitude. By factoring Tr [P*] from the numerator 
nd +/Tr [P?] from the denominator of (80), and then 
sing (82) followed by appropriate Taylor series expan- 
ons, one finds that 


1 oh Tr [P?] {1 — $a | op | + de | |"H] 
fe o/b VTr (P’] {1 — 4a | 0 | — Pe t+ 4) | 0 |} 


(82) 


(83) 


18 Both lim,.. Tr[P”] and lim,.. nTr[P”] must be zero from 
msiderations of convergence of the series for Qx and Var Qk. 
ctually, as may be seen from (81) and (82), the vanishing of 
r[P”] and nTr[P”] is a consequence of the assumptions that Tr[P?] 
finite and that the absolute values of the eigenvalues of P are 
ss than 1. 
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where we have included terms of no higher order than 
| p |’ in the ¢ arguments. Further Taylor series expansions” 
yield 


eS aleigacall aes cn) il): (84) 


a 


aod 


The comparison of two tests on the basis of error 
probabilities is, for small error probabilities, a very con- 
servative type of comparison. When error probabilities 
are small, large percentage changes in error probabilities 
can frequently be caused by very small changes in observa- 
tion time, SNR, or other parameters. For communication 
or radar applications, where SNR’s are readily changed, 
it would appear that a better comparison of two tests 
would be on the basis of a suitably defined SNR. In our 
problem, a suitable definition of SNR appears to be the 
argument of the error function in (77), 2.e., the difference 
between the means under the two hypotheses divided 
by the sum of the standard deviations. For Gaussian 
detector outputs (prior to thresholding), such a definition 
results in the optimum detector (error-probability-wise) 
having the maximum SNR. 

Examination of (83) shows that 7, the percentage 
difference in SNR between the optimum detector and 
suboptimum detector [consisting of the first term in (38)], 
satisfies the inequality” 

ese s Tole 
Thus, from an SNR point of view, the condition | p | «<1 
is sufficient to justify the use of only the first term in (38). 

Presuming that | p | < 1, we see from (83) that a low 
error probability requires Tr [P”] to be much larger than 
unity. The inequalities | p | < 1 and Tr [P’] > 1 are 
not incompatible. From (81), we see that satisfaction of 
these two inequalities implies that the sequence {A} 
must be a very slowly decreasing sequence (where 
Nee Nee Ae renee 


(85) 


VI. Some Uprrr Bounps on | p | 


We present here some bounds on | p | that may be of 
some use. A straightforward application of the calculus 
of variations shows that 


\2 


; 


(86) 
19 Note that 
¢(A + 6A) = G(A) 


where A < & < A + GA, and that 


| a(ayjo(ay | < af + 4], 


20 Again ignoring terms of order higher than p*. 
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where p is the eigenvalue of (the possibly nonsymmetric 
operator) P with the largest magnitude.”’ An application 


of the Schwartz Inequality shows that 


| (F259) ? < (Pe dew) OZER uv) 
Ll (ee 2) = (ae FIGs) 
Thus 
. (Px, Px) 
it ope oeen 
| p | = | | 1 | Max (x, 2) 


For an operator M with kernel J/(t, s) 


itn ee i) [ M(t, s)x()a*(s) at ds < i, it inne | ain aided: | 


= [ [ VIMEST © | VINES 1 | 20) | aras 


(87) 


(88) 


October 


In the case that the kernel P(t, s) has the symmetric 
form 


PG, =a OG ee (95), 

k=1 | 

{ 

it is shown in the Appendix that 
pl < Dy Max | fi | Max |) | (96) 


< Vf [ | ME, s) | | x7(é) | dt ds il {bE | M(t, s) | | 27(s) | dt ds 


where the last inequality results from an application of the 


Schwartz Inequality. Let 


Cu = Max if | M(t,-s) | is} 


Then we see that 
i} | | M(E, 8) | | 22(¢) | dt ds < Cy | | 2°(t) | dt 


i! / ACs. £) | | 27s) | di ds = Cy. | | a(Da de: 


Using (95) in (93), it follows that 


(Mx, x) 
(x, x) 


If we let M = P*P = M*, then 


< VO OE C 


[| P ll’ < Cree 


F 


Max ‘ iL | iL PU, 2)P*G, 2) de 


(90) 


(91) 


(92) 


Eaiving { / Ge | at} Max { ‘t Ge) ax} (93) 


= CrCpe- 
Thus we have the series of inequalities” 


fs Sue e ePIC ee ORC 


(94) 


21 This result in (86) is more general than (31), where the operator 


is assumed symmetric. 


22 Inequalities of these types for matrix operators can be found 
in [17], pp. 66 and 67, and [22]. The derivations leading to (94) are 
a generalization of a derivation shown to the author by Dr. R. Price, 
M.I.T., Lincoln Lab. Price derived the inequality | p | < Cp for a 


symmetric operator, P. 


(89) 


where F,,(f) is the Fourier transform of r;,(7), 


foo} 


RGwe / niet dr. 


(97) 

It is also shown in the Appendix that for N = 1, f,(t) = 1. 
and R,(f) positive, the upper bound given by (96) ap- 
proaches arbitrarily close to | p | as T > «.” 
When the two processes to be discriminated contain 
an additive white noise component of spectral intensity 
No, we may obtain a useful upper bound on | p |’ as follows: 


] 


le |S) PA) HAO eae aes | 


1 

y4-21 G3 
The second inequality is valid if D and K™ are bounded , 
linear operators.** The last equality (47) follows from 
the fact that || K7* || = 1/No.”° | 

From (98) we deduce that a sufficient condition for - 
using the first term in the expansion of the optimum > 
detector is that the norm (or the largest eigenvalue, 
since they are identical for symmetric operators) of the 
difference operator A — B be much smaller than the 
spectral intensity of the additive white noise. If the 
processes whose covariances are the kernels of the operators" 
A, B are stationary, then from (96) we deduce that 
another (but less tight) sufficient condition is that the — 


°3 According to Kailath, this fact has been demonstrated by 
Szego [21]. 

24 See [16], p. 149. 

25 It is assumed that A and B are integral operators, or, more to 
the point, that the eigenvalues of A and B have zero as a limit point. 


96 I 


aximum value of the difference spectrum be much smaller 
an No Application of (94) yields 


|| 4 — BI Sa Max [ | A(t, s) — BUt, s) | ds. (99) 


hus another sufficient condition for the use of the 
ptimum detector expansion is that 


7; Max [ ACen Gee inas << 18 TOO) 
0 t T 
APPENDIX 
It is sufficient to demonstrate (96) for VN = 1. We have 
ik I 2(t)2*(s)f(f*(s)\r(s — t) dt ds 
p | = Max |; (01) 
z(t) ice | ” (t) I? dt | 
r | 
et z7(t) be a truncated version of z(t) defined by 
Oe | Oe (102) 
2s Veal, 


‘hen, by an obvious change in variable, the double 
itegral in (101) becomes 


[ i. 2r(Het(sf()f*(s)r(s — t) dt ds 
= is k¥(r)r(7) dr (103) 


here k(r) is the autocorrelation function of z7(t)f(), 


bas) = | eto Ware + Die + Dat. — (108 
sy Parseval’s Theorem, 
[tier ar =f KHORM a (105) 


here K(f), R(f) are the Fourier transforms of k,(r) 
nd r(r) respectively. Noting that K,(f) is positive, 
re find 


2 oO 


Ke NRO) a < [| KR) |RO a 


< Max |RQ)| [Kel df. (106) 
ut 
* Kalf) df = ko) = flan) 1 4O Pat 
< Max [40 P [le Pa. (207) 
Use of inequalities (106) and (107) show that 
Lp |< Max |4(0 | Max | RQ) |. (208) 
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When f(t) is unity, it may be readily seen from the 
above that | » | may be expressed in the form 


[wore ai (108) 


| o | = Max < Max | R(f) | 
Wis) f 


where W(f) is a positive function with unit area whose 
Fourier transform is nonzero only over an interval of dura- 
tion 7. As T — o, W(f) can be made to approach ar- 
bitrarily close to the unit impulse. Note that the bound on 
the right side of (109) is actually attained when W(f) is an 
impulse located at that value of f for which | R(f) | is 
maximum. Thus as T > ©, | p | — Max, | R(f) |. 
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Error Correcting Codes and Their Implementation 
for Data Transmission Systems” 


J. E. MEGGITTT 


Summary—Presented here is a practical automatic error-cor- 
recting system that may be applied to many data transmission 
problems. It is particularly suited to the correction of bursts of 
errors and so may be applied to the problem of the transmission 
of data over telephone networks. 

The attractive feature of the system is its remarkable simplicity 
from the point of view of implementation. It is so simple that it can 
readily be incorporated into much existing equipment. 

In the system, messages are transmitted in blocks and each 
block is coded separately. The codes used within the blocks are 
cyclic codes. This means that coders and decoders employ linear 
feedback shift registers to form check digits and to correct errors. 

The basic ideas are presented in terms of the hardware com- 
ponents to which the system gives rise and analyzed afterwards in 
terms of mathematics so that it is easy for the engineer to see, 
at once, what is involved. 

The theory usually applies to binary messages in which data is 
transmitted serially. However, an extension is included which shows 
how the same ideas may be applied to binary codes in which 
information is sent in parallel. 


INTRODUCTION 


HE problem of automatic error correction is a 
Ale one in data transmission theory. Many 
existing communication systems are inherently 
noisy, and one has the simple choice between an ex- 
pensive improvement to them and the adoption of an 
error-correcting system if one wishes to transmit data 
accurately. It is becoming more and more clear that 
computers need to communicate with each other over 
long distances and that data for them must also be com- 
municated, so the problem is one that must be faced. 
The existing telephone system already provides an 
excellent world-wide communication system, so it is 
highly desirable to utilize it for these new purposes. How- 
ever, the system is inherently noisy, largely due to impulse 
noise generated in exchanges, so that any automatic 
error-correcting system which enables it to be employed 
is of great value. In the new data transmission fields, the 
aim is also to transmit information as rapidly as possible, 
because the more information that can be sent per second 
along a given line, the cheaper the transmission. Thus, it 
is even conceivable that it may be worth increasing the 
speed of a reliable system until errors start to occur, and 
this is a sensible thing to do provided, of course, that 
these errors may be automatically corrected. Fortunately, 
there exists a very simple type of error correcting system 


* Received by the PGIT, December 2, 1960. 
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which is cheap to implement and which is particularly 
suitable for correcting bursts of errors; this will be de-, 
scribed. There is still a fair amount of freedom within | 
the system so that the code adopted can be matched to- 
the kind of noise expected. Tests are presently in progress 
to determine the noise that different telephone lines. 
produce, so that shortly it should prove possible to make | 
an exact recommendation about what code to use in any 
given situation. / 

These remarks have been directed towards telephone | 
transmission systems. However, the same error correcting | 
systems find an application in radio transmission and in | 
the magnetic tape and disk recording of data. In the 
latter case, data may be much more densely packed if ' 
errors that are caused by small imperfections in the 


magnetic materials can be corrected automatically. | 


/ 


Contents 


The next section will describe the general apparatus | 
that is required. The following one will give examples | 
showing how it is employed, and the last will show some _ 
extensions of the theory. | 


Codes 


The data transmitted is going to be supposed binary; | 
so that messages consist of zeros and ones. Further, in the | 
first part it is supposed that it is transmitted serially, 
one digit at a time. In the last section of this paper, 
there are some extensions for the case where the data is — 
sent several digits at a time. 

The coding procedure consists of splitting the message — 
into blocks and adding to the information digits in each 
block certain redundant digits which are functions of the 
information digits. The redundancy is such that even 
though errors occur, there is enough information left for — 
the message to be corrected. The encoding problem is to— 
form these redundant digits simply, while the decoding 
problem is to reconstruct the correct message. 

The check digits are made linear functions of the in- 
formation digits; 7.e., the check digits are just the parities. 


of certain groups of information digits. This is not neces-— 
sary, of course, but it is certainly customary. The problem _ 


of code design that is left, having adopted this general 
strategy, is to choose the groups of digits for the parity 


checks intelligently, so that the coding process is simple — 
and so that the sorts of error that are expected to occur 


are corrected. 
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'yclic Codes 


The codes that are considered here are a further re- 
tricted class of parity codes called cyclic codes. These are 
iteresting because they are extremely simple to imple- 
2xent and because, historically, they were found to have 
aany interesting properties. Abramson [1], Melas [2], 
nd Fire [3], for example, realized that they had these 
roperties without realizing their inherent simplicity. 

Cyclic codes may be defined mathematically or they 
aay be defined for the engineer by the implementation to 
vhich they give rise. The basis of this implementation is 
he feedback linear shift register and the cyclic properties 
f this give rise to the name ‘‘cyclic code.”’ In this paper, 
he engineer’s approach will be adopted, and the apparatus 
vill first be described. Mathematics will only be introduced 
0 analyze its behavior. 


GENERAL ENCODER FOR CycLic CopE 


When a message block is transmitted, it is arranged to 
ransmit the information digits first, and to follow them 
yy the check digits. Let the total number of digits be n, 
nd the number of check digits k. 

The encoder is shown in Fig. 1 and consists mainly 
f a feedback shift register of length k. The c’s denote 
onnections to the register that can be made or not. 
‘he choice is left to the designer. It is convenient to 
lescribe the situation by putting c = 1 for a connection 
hat is made and c = 0 for one that is not. c) = 1, because 
therwise a shorter shift register would suffice. 


INFORMATION 


Fig. 1—Encoder. 


The adders form the modulo two sum of their inputs 
nd are really just EXCLUSIVE OR gates. The values 
hat are given to the c’s determine the properties of the 
odes, and when examples are considered, specific con- 
ections will be shown for specific codes. 

The operation of the encoder is that switch A is first 
et so that the n — k digits of information are directly 
ransmitted as they arrive and are also fed to the shifting 
scister which initially is empty. When all the information 
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has been sent, the switch A is reversed so that the encoder’s 
input is isolated, while the output from the shift register 
is transmitted. In this way, the k check digits are sent. 
While this happens the input to the shift register is zero, 
since the adder B has two similar inputs and so its output 
is zero. Thus, at the end of the transmission the shift 
register 1S again empty. 

The number k is the number of shift stages. n is such 
that if the shifting register initially contained 100 --- 0, 
and if it were simply fed back, without the presence of the 
switch A, then it would contain 100 --- 0, again after 
exactly n shifts and not before. 

It is now necessary to analyze the code defined by this 
encoder. If the digits of the message are a), do, --- Gn (Qi 
first), then it is seen that a,_,., --- @, are defined in terms 
of a, --+ @,-%, and it is possible to write down this relation- 
ship. To do this, it is necessary to describe the operation 
of the shift register mathematically, and for this it is 
convenient to denote its contents at any instant by the 
vector y and its contents after a shift by Ty, where T 
is the k by & matrix 


Rowen Creo CT con 
il 0 0 O 

Sua ee 0 0 () 
L ® 0) Cece. sane U4, 


(The first element of the column vector y corresponds 
to the contents of the left-hand element of the shift 
register. ) 

With this notation, the successive contents of the shift 
register in the encoder may be described. Initially it 
contains zero. It next contains a,x, where x is the vector 


all ail (2) 


When the second digit a, arrives, it contains 
a,TX + aox. 
When the third digit a; arrives, it contains 


a,T’x + a,Tx + a3x, etc., 


and this continues until all the information digits have 
arrived. However, even when the check digits are being 
formed, the check digits a,-,.1 :°: @ are fed back into 
the shift register in exactly the same way. Thus, ultimately 
it contains 


Ol xe Gl axe ee on oe (3) 


256 


Now, » was chosen so that T"x = x. Further, zero was, 
by design, fed back into the shift register for the last k 
shifts, so that it ultimately contains zero. 

Hence, 


al x + aT xe + +s +a,4T x -a,T x =0,..4) 
and this is the mathematical definition of a cyclic code. 
This form is necessary in order to analyze the errors the 
code will correct. It is this set of linear equations that 
defines the check digits in terms of the others. 


GENERAL DECODER FOR CycuLic CoDES 


The decoder for a cyclic code is shown in Fig. 2. It 
contains primarily two shift registers, one of length n 
where the message is stored until it can be corrected. 
Obviously, the message must be stored until the check 
digits arrive because, until they have arrived, it is not 
clear what has to be corrected. The other shift register is 
a feedback one of length k and has the same connections 
made to it as the one in the encoder has. 

The decoder also contains a detector. Its function is to 
detect certain configurations in the lower shift register. 
If one of these is detected, a one is emitted; this inverts 
the digit that is currently leaving the main shift register 
and also adds a one to the adder of the lower shift register. 
The configurations that have to be detected depend on the 
errors the code is designed to correct, and what these 
have to be will be indicated shortly. 

The operation of the decoder is as follows. As the 
message is received, it is stored in the main shift register, 
while simultaneously the arriving digits are fed to the 
lower shift register. During this time, the detector is 
disconnected. It is convenient to denote the incoming 
digits by afaf --- a’ which, of course, may differ from 
ere 

The lower shift register is initially made to contain zero, 
so that when the first digit a/ arrives it contains, exactly, 
as in the case of the encoder, a{x. It next contains 


Fig. 2—Decoder. 
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a'Tx + atx, etc., so that when the complete message has 
arrived, it contains 


alT x + afT x + --- + aT "x =z. (5) 


If there has been no error, so that the a’’s are the same 
as the a’s, then clearly z = 0 while, in general, when 
there have been errors, z # 0. By design, the z’s will be 
distinct for the distinct errors that it is required to correct. 
At this stage, the apparatus can already be used for error 
detection. All errors that lead to nonzero z’s will be 
detected. ’ . 

However, when it is used for error correction, the 
procedure is that the detector is next switched on, while 
the input to the decoder is disconnected from the adder | 
of the shifting register. This happens when z has been 
formed, and shifting is continued. As shifting continues, 
the contents of the lower register are operated on by T, - 
while digits of the uncorrected message leave the storage 
register. We choose the detector so that it will detect 
the pattern that occurs in the lower register whenever an 
erroneous digit has reached the right-hand end of the - 
storage register. At the next shift, the detector emits a one» 
which corrects the error, and at the same time a one is 
added to the feedback of the lower register to indicate 
that a somewhat simpler error pattern now remains to be 
corrected. The process continues until the entire message | 
has left the main store. 

It should be observed that no fresh message can be re-_ 
ceived while the present one is being corrected. Thus, if 
digits arrive continuously, some sort of tandem arrange-— 
ment must be devised. 


THe DETECTOR 


The states the detector has to detect depend on what 
the code is designed to do. The simplest detector is for 
the case where single error correction is required. 

SINGLE Error DETECTOR 


If there is, say, an error in the rth digit, then 
aj =a, +1 


while all the other a’’s are equal to the corresponding a’s. 
Hence, from (5) 


(6) 


Now the rth digit leaves the main storage register after 
(r — 1) shifts. Thus, at this time the contents of the lower 
register are 


a eX 


Tia lax) 
SX) 


Thus, it is necessary to detect the state T~*x, and if this 
is done, any single error will be corrected. 
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I 


and since ¢) = 1, 


(7) 


ale! 


r any T. This just means that it is necessary to see when 
ie last element of the register contains one, while all the 
hers contain zero, and this specifies the detector. 
When the detector operates, it also emits a one into the 
Ider of the shift register. It, therefore, effectively modifies 
s state y, say, to 


y + Tx. 


In this case y = T’x, so that after the next shift, it 
yntains 


GCS xe Tx) 
= 0. 


The shift register now contains what it would if the 
essage were correct, and no further correction takes 
lace. 


DovusLe ADJACENT HRROR DETECTOR 


If the code is designed to correct single and double 
Jjacent errors, then the detector must detect another 
ate besides T” *x. 

Suppose there are errors in the rth and (r + 1)th 
igits. Then in the same way as before 


ji AEE ey & (8) 


The rth digit leaves the main storage register after 
- — 1) shifts so that at this time, the lower register 
ontains 


Gy ae eee 
Hence, by arranging for the detector to detect 
[-* + T~’)x, the rth digit will be corrected. 
Since, it has been arranged to add T’x to the lower 


sgister when the first correction takes place, the lower 
gister will contain after the next shift 


Gb as pe Ee Se 


= Tx. 
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Now this is detected by the detector as for a single 
error; thus the r + Ith digit is also corrected and the 
register is left empty after the next shift. 

The state (T-* + T~*)x has a form which depends on T 
and so must be calculated from a knowledge of T. This 
is the form of the detector for a code that corrects single 
and double adjacent errors. 


DETECTOR FOR Burst ERROR CoRRECTION 
Errors BeInG ConFINED TO WITHIN A LENGTH p (p < k) 


The theory just presented extends in an obvious way 
to the case where bursts of errors up to those of length p 
have to be corrected. This means that when errors occur, 
they are spread over p or less consecutive digits, though 
not all p digits are necessarily wrong. It is assumed that 
the code used is such that this amount of correction is 
possible, and the problem is to see what the detector for 
this code should consist of. 

The arguments of the last section may be followed 
exactly and it is found that for a burst code the detector 
must detect 2”-* states, corresponding to the 2”-* different 
burst patterns. These states are of the form 


z= le +» aT) (9) 
4=2 
where the q’s take all combinations of values zero and one. 
Such a detector is quite easy to build, though it apparently 
gives rise to a certain complexity. However, there is an 
almost trivial arrangement that detects all states of the 
form (9). This is shown in Fig. 3, which shows primarily 
the basic feedback shift register of the decoder, but 
attached to it, the detector. This is designed to operate 
when the first k — p digits of the shift register contain 
zero, while the last p digits are such as to cause the output 
from the feedback adder to be one. The assertion is that 
this arrangement detects all of the states of the form (9). 


OUTPUT 
FROM 
DETECTOR 


INPUT 


Fig. 3. 
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Proof 
From the theory 
0 | 10 0 
| O 0 0 
fax jc Cai Wo, op f= ae Ti ol ete. (10) 
0 0 1 
0 1 h, 
1 Lt Sie 
where operating with T, it is seen that t,, t, ts --- are 
defined by 
a+ =0 
Gs 1G seta = 0) 
¢; + Got, + Gb + i= 0 (11) 


Thus when the shift register contains T’x, the output 
from the adder is 1; when the shift register contains 
T °x, the output from the adder is c, + ¢, = 0; when the 
shift register contains T-°x the output from the adder is 
Ce + qt, + tf = 0, ete. 

Hence, when the shift register contains T ‘x and a linear 
combination of T-'x(¢ = 2 --- p), the output from the 
feedback adder is exactly one and, of course, the first 
k — p elements of the shift register are zero. Now, when 
the first k — p elements are zero, the output from the 
adder is one for 2” " states, and zero for 2” ' states. 
Hence, the arrangement just described detects exactly 
those states of the form (9), and no more, and this com- 
pletes the proof. 

In the implementation previously described, the de- 
tector emits a one into the adder of the feedback shift 
register when it operates. In the new simplified arrange- 
ment, this may be done by having a separate two-input 
adder in the feedback loop for this purpose. However, it is 
simplest to observe that the effect of the addition is to 
feedback zero into the shifting register. Consequently, 
the same effect may be achieved by arranging for the 
operation of the detector, to disconnect the feedback loop, 
and this is shown in Fig. 3. 

It should be carefully observed that this theory of the 
detector works so long as the code is such that burst 
correction up to length p is in principle possible. If one 
uses for example, a detector for p = 2 on a code that is 
only designed for single error correction, then what 
happens is that ambiguity occurs between single and 
double adjacent errors. Single errors in certain positions 
will be treated, and erroneously corrected, as double errors 
in other positions. 

It should be noticed also that p « k, because, in fact, 
p — 1 digits of the k check digits described the nature of 
the error burst, while the other k — p + 1 have to be 
sufficient to position the burst. 
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DETECTOR FOR GENERAL CycLic CopES 


A general cyclic code corrects various patterns of errors. 
If the code is such that an error pattern 1q.9g; --- can be 
corrected, then exactly as in (9) the detector in the decoder 
should be such that it detects the state 


zZ=(T* + oy, AR Ope 


It is also necessary that the code should correct, and the 
detector should detect, all simpler patterns, but presum- 
ably this will be required anyway. In general, there will 
not be the attractive simplicity shown in the last section, 
though some simplification is often possible. 


CONCLUSION OF DESCRIPTION OF APPARATUS 


This concludes the description of the apparatus. The © 


main problem that now needs to be discussed is the 
connections that need to be made to the shift registers to 


produce codes with assigned properties. The next section 


will contain some rules for the construction of some very 
powerful codes. However, a complete search through all 
possible sets of connections might lead to some other 
useful codes, and it may be worthwhile to use a computer 


to examine exhaustively the properties of all codes that 


can be generated using the apparatus. 


SINGLE ERROR CORRECTING CODES 


The connections required to produce single error cor- 
recting codes will first be considered. 

As has been observed, a single error in the rth digit 
leads toz = I “x: 

The code will, therefore, be capable of correcting a single 


error if all the vectors T~’x are different for different r, — 


and this is indeed the case by construction. The most 
efficient code will be obtained when the shift register is 
connected so as to produce a maximal length cycle of 
length 2° — 1 in it. The cycle structure [4] of shift registers 
is generally discussed in terms of the characteristic equa- 
tion that the matrix T satisfies, and from (1) it is seen that 
this equation is 


T’ + qT" ' +¢..1°? + ---+¢4T +1=0. ~G3i 


The characteristic equations that produce maximal | 


length cycles are well documented and lists of them have 
been published. Thus, these may be used for these single 
error correcting codes. 


Example 
A code of message length 7 with 3 check digits is ob- 


tained by using the characteristic equation 


Tes ales (14) 


which is a very simple case taken from the list. The 
connections to the basic register are shown in Fig. 4. 


October 


(12) 


ee ees ee 


| 


; 
; 


| 
| 


/ 


| 


! 


From (4) it is found that the coding equations are 
xplicitly 


Q543 + Agi; + O24; + ii, = 0 


Fig. 4. 


DovusiE ApJACENT Error Correcting Copes 


The connections for these codes will next be considered. 
s has been observed in (8), it is required that the vectors 


Tox ands T\(e-= T= )x 


ll should be different for different values of r and s. 

his will be the case if the feedback shifting register is 

ich that the vectors x and (1 + T™')x lie on different 

ycles of the same length. 

The most efficient way of achieving this is to take a 

haracteristic equation that has the form 
(1 + T)M(kT) = 0 (15) 


here M(kT) = 0 is a characteristic equation that pro- 
uces a maximum length cycle of length 2° — 1. 


root 
Consider the k + 1 by k + 1 matrix 


Tk 
(Pi i—— (16) 
Ont 
here T, is a matrix satisfying 
M(kT,) = 0. (17) 


Then, clearly, the characteristic equation that T satisfies 
just (15), though this newly-defined T has a different 
rm. Now the cyclic structure is determined by the 
aracteristic equation (except in exceptional circum- 
ances) that T satisfies, so that this newly-defined T 
ves rise to a similar cycle structure. The cycle structure 
this new T is now obvious. 

There is a cycle of length 2° — 1 with vectors of the form 


[ees] 
0 


here x; is the k by 1 column vector 


Meggitt: Error Correcting Codes and their Implementation 


239 


and there is another one of length 2° — 1 with vectors of 


the form 
Ee] sen 


x, Te (Healy xe 
poe Fa | i lal 0 | 


so that the sum of two consecutive vectors on this cycle 
gives a vector on the other cycle, and this is exactly the 
the structure required. 


O° = 


Example 
A code of message length 7 with 4 check digits can be 
obtained by taking 


MRT) = Ty Tal 

so that the code is described by the characteristic equation 

(T+ 1)’ + T 41) =0 
T+TP+T+1=0. 


(18) 


(19) 


This code is capable of correcting single and double 
adjacent errors. The basic shift register is shown in Fig. 5. 
The coding equations are explicitly 


O42: + O34; + is; = O Y= Oe eee. 


Fig. 5. 


The detector in this decoder has to detect the states 


0 0 
0 

: and . 
0 1 
te II 


This is equivalent to detecting two zeros in the first two 
shift elements and a one in the last, and this is in ac- 
cordance with the rules for detectors to perform burst 
correction. 


Fire Codes [3] 

There exists an extension of the double adjacent error 
correction codes just described. These are derived from 
a characteristic equation 


(T’ + 1)M(kT) = 0 (20) 


where p is prime to 2" — 1. This gives rise to codes of 
length p(2" — 1) with p + k check digits. The chief 
virtue of this form is that the resulting cycle structure 
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is easy to analyze. For this purpose, it is best to consider 


the p + k by p + k matrix T defined by 


Teor, 
Ae — 
Calo, 
where T, satisfies M(kT,) = 0, and T, is the p by p 
matrix 
0 0 Ogle 
1 O 0 
To Oa 0 
O00 <r ies02 


As before, this T is different from the matrix usually 


(21) 


(22) 


considered, but it satisfies the same characteristic equation, 


and hence it has the same cycle structure. 


The basic cycle of length p(2" — 1) may be taken as 


consisting of vectors 


Ea 
T2 "X, 
where x, is the k by 1 vector 
al 
0 
0 


and x, is the p by 1 vector 
Py 
0 . 


| O 


0 


Then, the vector 


Cao at] ®| 


which is a linear combination of vectors on this basic 


cycle, will be found to be 
Reta DS cas Jes) 


1 


Further, all other members of the cycle to which thi 
vector belongs will contain in the lower p positions just 
a cyclic permutation of the p values shown in this vector, 

Thus, in general, the vectors 


> a | 
a=1 2 
will be found to be on different cycles for different sets of, 
qi, each cycle being of length p(2" — 1), and this is exactly. 
the structure required for a code that corrects sets of | 
errors 19,9293 °°° . 
The cycle will indeed be of length p(2" — 1), provided | 
first that the g’s are such that p cyclic shifts are required 
before the pattern shown repeats itself, and second that 
(1 + >>,-1 ¢:T7')x, 4 0. This latter condition may be 
ensured by taking a characteristic equation of sufficiently 
high degree for T,. 


Example 


When p = 5, it is found that errors 1, 11, 101, 11 
1111, and 1011 or 1101 may be corrected. 


( 
Thus, the characteristic equation 


T+1(T+TT?+)H=0 | 
TTY ees ala) (23) 


} 
: 
| 
gives a code of message length 155, containing 10 check 
digits, that is capable of correcting errors of the above 
form. 


Fig. 6. 


The shift connections are shown in Fig. 6, and the — 
detector has to detect six patterns. These patterns may 
easily be calculated and are in fact | 


00000000 0 1 for error pattern 1 
000000001 1 forerror pattern 11 | 
00000001 0 0 for error pattern 101 | 
00000001 1 O for error pattern 111 | 
0000001 1 0 O for error pattern 1111 
and 00000010 0 1 for error pattern 1101 


or 0000001 1 1 O for error pattern 1011. 


It is seen that the detector for the first four patterns 
can be exactly the detector for burst patterns of length 3. 
The other two patterns must be detected separately. 
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If this code were used in a practical situation, a good 
vay of using it would be to use the simple detector for 
ursts of length 3, so that just bursts 1, 11, 101, and 111 
re corrected. It will be remembered that after the entire 
orrection has taken place, the contents of the feedback 
hift register in the decoder is zero. Hence, if the message 
as been sent correctly, or if errors 1, 11, 101, 111 have 
ecurred and have been corrected, the shift register will 
e zero. Therefore, if at the end of this correction pro- 
edure, the register is not zero, this clearly indicates 
hat some other error has occurred, and so this system 
aay be used to detect the presence of a number of further 
rrors. 


GENERAL Burst CorRECTING CopES 


It is clear that the Fire code [3] just described will 
orrect all errors within a burst length p + 1/2, p odd; 
/2, p even, as well as many more. This gives a straight- 
orward way of constructing a general burst code. 

For example, by taking 


a) eee a ey eg) ne Oa 


[| is possible to obtain a code of message length 693, 17 
igits of which -are check, and the code is capable of 
orrecting all burst errors up to length 6, as well as many 
nore. 

It should be recalled that the apparatus required in the 
neoder and decoder for this is essentially a shift register 
f length 17, which is reasonable, while the decoder must 
lso contain a shift buffer of length 693. 

It should be observed that though this Fire code may 
eem inefficient for burst correction, in this example only 
bout 3 more check digits are used than an optimum code 
vould employ, and the difference seems immaterial. 

As in the previous example, it is very simple to employ 
uch a code as a burst correcting code, using the simplified 
etector in the decoder that has been described. The re- 
yaining redundancy that the code contains can then be 
sed for the detection of other errors, and this is done by 
xamining the shift register at the end of the correction 
rocedure to see whether it is zero. 


BosE-CHAUDHURI CODES 


Bose and Chaudhuri [5] have given a theory of cyclic 
odes for general multiple error correction. Their theory 


hows that if the characteristic equation is taken as 
M(kT)N(T) = 0 (25) 


there N(S) = O is the characteristic equation that 
, = T° satisfies, and T satisfies M(kT) = 0, then the 
asulting code is able to correct all double errors. 
Tor example, if 
Ce ae ee (26) 
ais implies 


T°+T+T4+T+1=0. 
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Thus 
S = T° satisfies 
s*+s°+9°+S+1=0. (27) 
Consequently, the characteristic equation 
(T+P+HT74+T+T+T4+1)=0 
Sebati seal 1) (28) 


according to the theory of Bose and Chaudhuri gives a 
code of message length 15, with 8 check digits, which is 
capable of correcting all double errors. 


Teves We 


Fig. 7 shows the shift register connections. Explicitly, 
the code is 


Asti Se Orn + Aex; + a4; = ( for 7=0.--:- 7. 


The detector in the decoder must, of course, detect all 
14 states of the form 


Tox + T"'x 


1ORs =e ae 
The same theory shows that a triple error correcting 
code is achieved if the characteristic equation is taken as 


M(kT)N(T)P(T) = 0 (29) 


where P(S) = 0 is the equation that S = T° satisfies. 
Thus, in the previous example, 


PS) = +841 (30) 


and it is seen that in this way a triple error correcting 
code can be constructed with message length 15, 10 
digits being check digits. 

Unfortunately, the number of different patterns to be 
corrected is now very large (~15°), so the detector loses 
its simplicity. Other schemes then, in fact, become feasi- 
ble, but to describe these is beyond the scope of this paper. 


EXPANDED DousLE HRROR CORRECTING ERRORS 


The previous theory shows how to construct double error 
correcting codes of message length 2” — 1, where there 
are 2p check digits or less. It is, in fact, possible to con- 
struct double error correcting codes of length 2” + 1 
using 2p check digits, provided 2” + 1 is not divisible by 3. 

The necessary results will merely be quoted. It may be 
shown that the characteristic equation 


ge ey a (31) 
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Where a is a Galois field element belonging to GF (2”) 
gives a cycle structure which consists of 2? — 1 cycles 
each of length 2” + 1. This theory is like the previous 
theory, but vectors and matrices have elements belonging 
to GF(2’). 

Further, if 2? + 1 is not divisible by 3, it may be proved 
that the vectors 


a Lee en x 


all lie on different cycles for 7 = 1, 2, 3 --- 2” *.Thus 
the cycle structure is suitable for building a double error 
correcting code. 

T may be thought of as a 2 by 2 matrix with elements in 
GF (2”), or a matrix representation may be taken for the 
field elements, in which case T may be thought of as a 
2p by 2p matrix with elements zero or one. This matrix 
will have the same cycle structure, so the characteristic 
equation is required for this 2p by 2p matrix. This may be 
obtained at once by using the equation that a satisfies to 
eliminate a from (31). 

An example will perhaps clarify these points. Let a be 
defined by 


a+tat+1=0 (32) 


so that a belongs to GF(2*). Then, since from (31), 
a = T' + T, on substitution 


(T?+T7)*+(T'+TD4+1=0. 
Consequently 
T+TP+T+T+1=0. 


Thus, using the theory, a code having the shift con- 
nections shown in lig. 8 will have 17 message digits, 8 
of them check, and will be capable of correcting all 
double errors. 


(33) 


Fig. 8. 


Explicitly, the coding equations will be 
Choe: 1 O74 Tt Osu; 1 Ose, 1 Oe aay, = 0 


1OT aOR ware eds 
This code is thus seen to be slightly superior in efficiency 
to the corresponding Bose-Chaudhuri [5] one. 


(34) 


SINGLE CHARACTER CORRECTING CODES 


Hitherto, all the theory has applied to binary messages, 
and as a consequence, modulo 2 arithmetic has been used 
to describe what happens. This is no hardship, of course, 
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for messages usually are binary. However, there is a 
straight generalization of the theory in which the digits 
of the message take g values (q prime). The shift registers 
have elements with g stable states, and the characteristic 
equation describing the code has coefficients that belong to 
a prime field. This extension seems to have only academic 
interest, but there is a further similar extension in which 
the elements of the message take 2” values. The shift 
registers have elements with 2” stable states, and the 
characteristic equation has coefficients that belong to the 
Galois Field GF(2”). This extension is interesting because, — 
when a binary message is transmitted p digits at a time, 
instead of serially, this is exactly the situation. Conse- 
quently, it is profitable to investigate whether any of these 
extended equations produce worthwhile codes, and also— 
whether these complicated shift registers can be built 
out of multiple binary ones. 

In the previous section, characteristic equations of this — 
form were introduced, but only as an aid to analysis. 
Now it is being suggested that they should be used 
seriously. 


Example 
An example will show the possibilities. Consider the 
characteristic equation 
T +eT+1=0 (35) i 
where b belongs to GF'(4), and b is defined by | 
yo +b+1=0 
so that 


Oneal 


found that a double error correcting code of message | 
length 5, 4 of them check digits, is obtained. There is, | 
thus, only one information digit, and the two coded — 
messages it is possible to send are found to be 00000 and — 
11111. It is thus trivial that double error correction is 
possible. ) 

If, however, (35) is used in its own right, T is taken | 
to have the form 


, 
(If the theory of the last section is applied to this, it is | 
{ 
| 


Toe ‘ (37) 
ih at) . 
: 
x= | 
: | 
and it is found that T’x = x. 
Thus, the coding equations of the form (4) become 
0 1 | 
a ay [tei |tal? tal) =o (38) 
1 Lb b 1 0 | 


and the message length is 5, while the number of check 
digits is 2. The a’s of course take values 0, 1, b, b’. 
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| The basic shift register for this code is shown in Fig. 9. 
he shift elements take four values 0, 1, b, 6’, and the 
ircle with 6 inside indicates multiplication by b. 


es 


Fig. 9: 


The cycle structure of the register clearly consists of 3 
cycles each of length 5; one cycle starts with i one 
b LO_! 


0 


with 


, and one with | 
0. 
Consequently, the 15 vectors 
b'T’x 
are all different for different 7 and r(i = 0, 1, 2; 
ry = 0,1 --- 4). Thus, the code defined by (38) is capable 


of single character error correction, where each character 
may be in error by 1, b or b’, 7.e., in any possible way. 


Encoder for Example 


The encoder for this problem has exactly the form of 
the basic one shown in Fig. 1, but incorporates the register 
shown in Fig. 9. 


Decoder for Example 


The decoder has exactly the form of that shown in Fig. 
2, except for the detector. It is seen that for an error of 1, 


0 
the detector has to detect the state LP for an error b, 
i. 0 
the state E for an error b’, the state a 


Hence, the simplest arrangement is that shown in Fig. 
10. The detector now has to determine not only whether to 
correct the output from the main register, but by how 
much. 

The AND gate operates when the contents of the first 
alement of the register is zero, and allows the contents 
of the second element to be added to the output from the 
main shift register. 


ConsTRUCTION OF I’oUR-STATE SHIFT REGISTER 
ELEMENTS 


The construction of an encoder and decoder for this 
problem is now clear in principle, but it is not yet clear 
how to actually construct the basic shift register in Fig. 11. 
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Soli ale Tiel ee 


QuUTPLT | IF INPUT O 


Fig. 10. 


Fig. 11. 


lor this, a representation of the elements of the Galois 
Field is used. Any character of the message, or any element 
of the register, may be written 


y = (A + Bb) 


where A and B take values zero and one. y may be repre- 
sented by the vector (A, B). 


Thus, the state 0 is represented by (0, 0). 
Thus, the state 1 is represented by (1, 0). 
Thus, the state b is represented by (0, 1). 
Thus, the state b is represented by (1, 1). 


When two elements y; and y; are added together, as 
they are in the adder of the shift register, then clearly 
the A’s and B’s of the representation are added separately. 

Further, when an element y is multiplied by b, it is 
found that 


(Ab + Bb?) 
peECAeeaB\b 


by (39) 


so that the vector representing by is (B, A + B). 

Thus, the shift register may be built from binary 
components that impose the correct relations on the 
A’s and B’s. The binary representation of the basic shift 
register is shown in Fig. 11. 
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GENERAL SINGLE CHARACTER CORRECTING CODES 


The ideas illustrated by this example generalize very 
easily. When the characters in the message take 2” values, 
it is necessary first to find a characteristic equation of 
degree k with coefficient in GF(2”) that has a cycle struc- 
ture of 2” — 1 cycles, each of length h = 2”° — 1/2” — 1, 
and this can, in general, be done. When this is done, the 
cycles can be written out and it will be found that either 


1 | Poy ee oe 
0 | 0 0 
the cycle that starts with | - | contains|-|,|- | ete., or 
L 0 LOZ FEO 


that it does not. If it does not, then clearly the 
result is a single character correcting code that corrects 
errors 1, b, b’, etc. If it does, then the theory breaks down, 
but clearly 2”"* must be a factor of h. Hence, this is a 
general method of constructing single character correcting 
codes, provided 2”~* is not a factor of h. 


Examples 
1) Characters taking 4 values; b? + b+ 1=0. (40) 
T’ + DT + 1 = 0 gives a code length 
5 with 2 check characters. (41) 


T* + OT® + dT’? + OT + 1 = O gives a code 
length 85 with 4 check characters. (42) 


Note that the above restriction prevents the finding of a 
code of length 21 with 3 check digits by this method. 


2) Characters taking 8 values. The 


field is defined by c? + ¢ + 1 = 0. (43) 
T’ + cT + 1 = O gives a code length 
9 with 2 check characters, (44) 
T’ + cT + 1 = O gives a code length 
73 with 3 check characters. (45) 
3) Characters taking 16 values. The 
field is defined by a4 + a+ 1 = 0. (46) 
T’ + aT + 1 = O gives a code length 
17 with 2 check characters. (47) 


Thus, it is possible to construct in this way many codes 
for correcting single character errors in information that is 
transmitted in parallel. There is, of course, no reason why 
these ideas should not be applied to the construction 
of burst error correcting codes with messages whose 
characters take 2” values. However, no suitable character- 
istic equations have yet been found. These ideas may 
perhaps be applied very profitably to magnetic tape 
recording where binary information is handled in parallel. 
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CONCLUSION 


This completes the description of codes to be given. 
It is emphasized again that the apparatus needed for 
their implementation is as little as could reasonably be 
hoped for, and yet they have the power to turn an un- 
usable communication system into a usable one. 

The main problem for the future is to list the different 
shift register connections, together with the usable codes 
they produce, and to match these codes with the line 
characteristics that are found from measurements on 
communications systems. 

Of the codes that have been described, those that 
promise most for practical situations seem to be the Fire 
codes, where bursts of errors up to length p are corrected, 
while the remaining potentialities of the code are used 
for additional error detection. In this case, there is the 
greatest simplicity, and additional protection is provided. 
Telephone engineers and others are understandably re- 
luctant to employ a pure correcting code, because they 
are worried about the possible occurrence of catastrophic 
errors that might escape unnoticed, if some error detection 
were not included. 

The Bose-Chaudhuri codes should be used in cases 
where errors occur randomly. It is fair to repeat, as has 
been pointed out, that the decoder for a double error 
correcting code is a little cumbersome, since its detector 
has to detect a fairly large number of patterns, while 
for a triple error correcting code, the number of patterns 
becomes very large. 

It may well be that an optimum system contains a 
hierarchy of codes of the type described here, and that a 
message should be coded several times over at different 
levels. However, this is a problem for the future. 
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On the Optimum Range Resolution of 


) Radar Signals in Noise” 


N. J. NILSSON, MEMBER, IRE 


Summary—Optimum radar resolution is recognized to be a 
problem in distinguishing between different possible target con- 
figurations. Radar reception systems which perform optimum range 
resolution are then designed using the principles of statistical 
decision theory. In particular, the design of the optimum resolution 
system is carried out for a squared-error loss function, modified to 
provide extra penalties for wrong guesses about the number of 
targets present. Such a system is capable of simultaneously deciding 
the number of targets present, their spatial positions (ranges) and 
their relative amplitudes. The analysis also includes a discussion 
of an optimum device for the resolution of distributed (clutter-like) 
targets. 


INTRODUCTION 


HE ability to resolve multiple-echo signals in time 
‘|e the range resolution of radar systems. 

The resolution rule-of-thumb for pulse radars is 
that echo pulses separated by a pulse length can be 
resolved, but echo pulses which overlap to any significant 
extent appear as only one target. Woodward’ has de- 
veloped a generalization of this rule which is applicable 
even for radar (sounding) signals whose time-bandwidth 
products are larger than unity. After defining the so-called 
Radar Ambiguity Function, Woodward infers that the 
time resolution cell for any signal is equal to the reciprocal 
of the (sounding) signal bandwidth. 

However, this classical definition of resolution takes 
only qualitative account of the fact that the signals are 
embedded in noise. Even very narrow-band signals 
should be resolvable for arbitrarily small time separations 
in the complete absence of noise. We intuitively expect, 
then, that our ability to resolve two or more known 
signals in noise should depend, not only on the signal 
bandwidth, but on the echo-to-noise power ratio. In this 
paper, we shall treat the resolution problem as a problem 
of combined signal detection and _ signal estimation. 
Systems which achieve optimum range resolution will 
be derived, and their characteristics compared with 
those systems which are optimum only in the single- 
target detection sense. 

Various authors have recognized the need for treating 
the general signal resolution problem in a more precise 
fashion than can be done by appeals to Woodward's 


* Received by the PGIT, December 7, 1960. The research 
reported in this paper was conducted during. the author’s recently- 
completed term of active duty at Rome Air Dev. Ctr., Griffiss AFB, 
Rome, N. Y. 

+ Stanford Res. Inst., Menlo Park, Calif. 

1P. M. Woodward, ‘‘Probability and Information Theory with 
Applications to Radar,” McGraw-Hill Book Co., Inc., New York, 
N. Y.; 1953. 


ambiguity function alone. Swerling” has analyzed the 
problem of resolving two radar targets at the same range, 
but at (slightly) different angles within the antenna 
beamwidth. Helstrom’’* discusses the problem of dis- 
tinguishing between two noise-corrupted signals whose 
form and location are known exactly. In this paper, we 
shall attempt first to formulate a suitable definition of 
radar resolution and then to apply this definition to the 
design of systems which perform optimum radar range 
resolution. 

What is meant by “radar resolution?” A radar system 
achieving ‘‘good’”’ resolution should be able to provide 
continuous and reliable answers to the following four 
questions which specify the target configuration: 


a) How many (point) targets are there? 

b) What are their relative (spatial) positions? 

ce) What are their relative velocities? 

d) What are their relative amplitudes (cross sections)? 


We define resolution in the following way. An ensemble 
of target configurations (specified by the set of all situations 
which are allowable answers to the above four questions) 
is radar resolvable with average loss £, relative to a certain 
level of additive noise, a certain sounding signal, and a 
certain loss function L, tf their respective composite echo 
returns can be distinguished by the Bayes decision device with 
minimum average loss £&. 

Distinguishing among the many different possible 
target configurations is a problem in statistical decision 
theory. After defining the loss function Z incurred for 
wrong guesses about the target configuration, we may 
calculate the average loss for any decision system. De- 
cision systems with the lowest average loss are called 
Bayes decision systems and, therefore, according to the 
above definition, achieve the best resolution. 

This definition of resolution prompts us to ask three 
more questions: 


1) What zs the optimum or Bayes radar resolving 
system relative to the noise, the sounding signal, 
and the loss function? 


”) 


2 P. Swerling, ““The resolvability of point sources,’ in ‘‘Proceed- 
ings of Symposium on Decision Theory and Applications to Elec- 
tronic Equipment Development, Vol. I,’’ Rome Air Dev. Ctr. Griffiss 
AFB, Rome, N. Y., RADC-TR-60- 70A; April, 1960. 

30. W. Helstrom, “The resolution of signals in white Gaussian 
noise,’ Proc. IRE, vol. 48, pp. 1111-1118; September, 1955. 

4C. W. Helstrom, ‘Statistical Theory of Signal Detection,” 
Pergamon Press, New York, N. Y., ch. X; 1960. 
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2) How well does it resolve (what is the minimum 
average loss) relative to the noise and the sounding 
signal for the given loss function? 

3) For what (which) particular sounding signal(s) is 
the resolution best? 


The present paper is devoted to finding the optimum 
system asked for in question 1). The optimum system 
should provide answers to questions a)—-d) about the 
target configuration and distinguish optimally between 
different target configurations. We shall simplify the 
analysis, however, by assuming that all targets have the 
same angle and nave zero velocity; that is, we inquire 
only about the number of targets, their relative ranges 
and their relative amplitudes. Questions 2) and 3), which 
have to do with quality of resolution and optimum sound- 
ing signal selection, are beyond the scope of this paper. 
Future treatment of these interesting questions should 
provide answers which will replace the rule-of-thumb 
about the reciprocal of the signal bandwidth with much 
more precise statements about resolution quality.’ 


Bayes DEcISION PROCEDURES 


The Target Density Function 


When a radar (sounding) signal of form s(t) is trans- 
mitted into a target environment, the received echo will 
be a linear superposition of like signals. In this paper, it 
will be assumed that the targets are stationary so that 
the individual echo returns comprising the total received 
echo will differ only in time of arrival and amplitude, 
corresponding to different target ranges and cross sections. 
The total received echo signal S(t) can then be written as 


S() = ibs Asha) dr Op ee) 


where 
A(r) = target density function0 <7< T 
s(t) = transmitted (sounding) signal 
T = maximum possible target range (in seconds), 


assumed to be much greater than the reciprocal 
of the sounding signal bandwidth. 


s(t) is normalized such that 


P 
/ Si 
0 


A(r) describes everything that is relevant about a 
stationary target configuration. Over that range of 7 
where a distributed target may exist, A(r) is a continuous 
function of 7. On the other hand, point targets are repre- 
sented by Dirac delta functions. For example, if there is a 
point target of amplitude A, at range 7,, and another of 


(2) 


> For an interesting study of pairwise resolution and a technique 
for providing some answers to the problem of pairwise resolution 
quality, see G. W. Preston, ‘“‘The Advanced Theory of Radar 
Measurements,” Final Rept. to the Rome Air Dev. Ctr. on Contract 
AF 30(602)-2120, General Atronics Corp. Rept. No. 799-207-12; 
August 20, 1960. 
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amplitude A, at range 72, then 
A(r) — A, 6(r ro 71) + A, d(t car T2) 
and from (1) 


SQ = Ais = 7) +: Ays(t — 72). 


In what follows, we may at times restrict the possible 
target configurations to some given set. Let us call the 
set of allowed target configurations @ and let A stand for 
any member A(r) of the set @. Suppose further that we 
know (or may ascribe) some a priort probability measure 
p[A] to each member A of @. Such a probability measure 
is necessary in the Bayes decision procedure, and it is 
well to state at the outset our assumptions about p[A], 
even though they be of questionable merit. Sometimes we 
shall also define a probability measure p[.S] on the members 
S in the set $ of all possible received echos. S, of course, 
stands for S(t). When there is a one-to-one correspondence 
between the members of sets @ and §, then p[A] will be 
identical with p[S]. 


The Received Datum 


We shall assume that the composite echo S(t) is ac- 
companied by additive, stationary, Gaussian noise denoted 
by N(t). For simplicity, let N(¢) have zero mean value. 
The total received waveform is then 


X() = SO INO) 2 0 ea (3) 


We shall use the notation X as representing an arbitrary 
received waveform X(t) belonging to some set of wave- 
forms wv. 


Loss Functions and Bayes Decisions 


The problem posed in this paper is: after reception of 
X, we must decide in an “optimum” manner which A in 
@ represents the actual target configuration. We shall 
denote the result of this decision as A, our estimate of A. 
In this paper, we shall equate the set @ of all possible 
estimates with the set @ of all possible target density 
functions. That is, we shall never make an estimate A 
corresponding to an impossible target configuration. 

We have defined an “optimum”’ decision as a Bayes 
decision. Let us denote the Bayes decision for A as Ay. 
To make a Bayes decision, we must define a loss function 
L which fixes the loss incurred for erroneous decisions. 
That is, if A represents the actual target situation, but 
we decide A, then we lose an amount L[A, A]. The Bayes 
estimate A, is that 4 which minimizes the average value 
of L.° Since choice of any A also implies by (1) a com- 
posite echo S(A), we may choose to define our loss function 
in terms of S(A) and S, that is, L = L[S(A), S]. [When it 


6 For a general discussion of Decision Theory, see: D. Blackwell 
and M. A. Girshick, ‘““‘Theory of Games and Statistical Decisions,”’ 
John Wiley and Sons, Inc., New York, N. Y.; 1954. For applications 
of Decision Theory to signal detection, see: D, Middleton, ‘““Random 
processes, signals, and noise—an introduction to statistical com- 
munication theory,” in ‘‘Pure and Applied Physics, Introductory 
Series,” McGraw-Hill Book Co., Inc., New York, N. Y., ch. 21; 1960. 
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is convenient to write S(A) explicitly as a function of time, 
we shall denote it by S(¢).] 

_ It is obvious that one way to minimize the average value 
‘of L is to choose A as some function of X in such a way 
that for every X, L averaged over the a posteriori Probe 
ability measure for A, denoted by p[A | X], is minimized.’ 
That is, minimize ae conditional pe boctetion of L given 
EX, Henoted by EF x{L]. If L is defined in terms of A and A: 


Ex(L(A, A)] ae plA | X]L[A, A] (4) 
where p[A | X], an a posteriori probability measure, is 
called the a posteriori likelihood of A being the target 
density function given X. The sum over the set @ repre- 
sents an average over this set. p[A | X] is obtained from 
the defined a priori probability measure p[A] using Bayes’ 
rule and the noise statistics. In case L is defined in terms 
of S(A) and S, and, if there is a one-to-one correspondence 
between members of the sets @ and 8, then we may write 


Ex[L{S(A), S}] = dials X]L{S(A), S]_ (5) 


where p[S | X], an a posteriort probability measure, is 
called the a postertor7 likelihood of the composite echo S 
given X. p[S | X] can be calculated from the probability 
measure p[S] using Bayes’ rule and the noise statistics. 
In both (4) and (5), Ex{L] is a function of A. The Bayes 
estimate A, minimizes Hx[L] over all other estimates A 
in the set of possible target configurations @. In order to 
proceed further to see what sort of decision procedures 
arise, we must assume some particular loss functions. 


ESTIMATION OF THE PARAMETERS 
or n Point TARGETS 


The Loss Function 


Before trying to design a system to guess how many 
targets exist, let us assume that a known number 7 exists 
and that we must estimate their parameters (ranges and 
amplitudes). Let us define the loss function L as being 
the integrated squared error in terms of the echo signal. 

£0) 

jie / £8) — SD}? at (6) 
0 
where S(t) is the actual echo signal (a random process), 
and S(t) is the composite echo signal implied in (1) by 
the choice of the estimate A(r). S(d) is therefore a function 
of the estimate A. Let us also restrict the set @ to those 
target density functions which represent a collection of n 
point targets, that is 


Nee » Wane. (7) 


= 
| 


= amplitude of zth target. 


= range (in seconds) of 7th target, 7; < T for all 7. 


7 Middleton, op. cit., p. 1028. 
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Estimation of a particular target density function A is 
now achieved by estimating the components of the n- 
vectors 

and: A =r: Apees Ae 


: 7, | 


Making use of (6) and (7), and assuming a one-to-one 
correspondence between the members of sets @ and §, the 
conditional expectation of the loss is 


nlf (& su — 00 — a0) 


where the A; and #, are estimates of A, and 7,, respec- 
tively, and the expectation is taken over the a posteriori 
likelihood for the waveform S. Expansion of this ex- 
pression yields 


Ex{L] =f | & Aate - | a 
Log ih pb EO +) fete di 


+f " Bal S(O] at (8) 


Since Hx[Z] is an average taken over the a posteriori 
hkelihood for S(t), the term /x[S(¢)] in the above equation 
will be a waveform or function of time. 


Minimization of the Average Loss 


Since we are trying to minimize Hx{L] by choice of 
A; and ?,(i = 1, 2, --- n), we need only maximize the 
expression 


n T 
J.=2 04, | Bx[s(pse — 4) at 
i=1 ) 


_ > Ds AA. [ s(t — #)s(t — #;) dt. 9) 
The above equation can be written as 
DOE E aS OD UTA = (ul) 
where 


Pe | Ex[S()|s(¢ — 2, dt 


and 

Uh 
/ s(t — #,)s(¢ — #;) dt 
0 


= the “autocorrelation function”’ of s(t). 


¢; 1s obtained as the output of a filter, at time 7;, which 
is matched to the transmitted signal s(¢) when Fx[S(t)] 
is the filter input waveform. 

We shall first maximize J, by proper choice of the A, 


and then obtain final maximization by choice of the 7;. 
We can differentiate (10) with respect to A, and set the 
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result equal to zero to derive the following relationship 
that must hold for the maximizing amplitude estimates 
Axa 


(11) 


If both sides of the above equation are multiplied by 
A, and summed over k, we obtain 


n 


Dy (A AN —= yy A nob: 
k=1 


1,k=1 


(12) 


Since (12) must also be satisfied by the maximizing esti- 
mates A,,, then J,,, maximized over the A,, can be written 
from (10) as 


(13) 


where the A,;, must satisfy (11). 

It is perhaps more convenient to express the above 
relationships in matrix notation. Let us represent all the 
A,, by the the n-dimensional column vector 


(4,.] 
Ah 
A, = 
a 
A,.) (14) 
and group all the i,; into an n X n matrix 
ee Ne XG Ae 
Ne 
4 =. \|-> (15) 
neh Ng; Nes 
Ne Nes 
Then the set of (11) may be written as the matrix equation 
¢ = 3A, (16) 
where @ is the n-dimensional column vector 
dr 
be 
cay (17) 
Pi 
‘Dn 
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Bothé and 3 are functions of the vector = = [#1, 72, -*+ ?nl- 
The maximizing estimate A, must then satisfy the matrix 
equation 


a 


A, =2°'6 (18) 


where 3,”' is the inverse of 3. 
J,. can also be written from (13) in vector notation as 
the dot product 


Jni=o- A, (19) 
or, using the relationship given in (18), 
Ino = $-(0"'§). (20) 


Eq. (20) is an expression for J,, maximized over the vector 
A. J,,, is still a function of the vector <. J,, must then be 
further maximized by choice of a %, which fixes @, and 
oun 

The final maximum for J, can then be written as 


Tn = 1° (n“dr)- (21) 
Upon discovering the maximizing t,, we may write an 
expression for the Bayes estimate A, from (18) in the 
following way 

ee a (22) 

Let us summarize the above expressions by stating the 
rule for finding the Bayes estimates hb per cn A fib) 0) 34a 
for the positions and amplitudes of a known number n 
of targets: 

1) Form the quantity ¢(7) by passing L’y[S(t)| through 
a filter matched to s(¢). 

2) Form an n-vector > = [6(71), --- (#,)] and an 
n X n matrix X by selecting estimates 7,, --- 7, in such a 
way that the quadratic form J,,, = 6-(4 ‘6) is maximized. 
Let us say that maximization occurs for the vector 
eo = [tiw, --- 7,0] which determines a by and a dy». 

3) Using the maximizing ¢,, calculate the Bayes ampli- 
tude estimates A,,, --- A,, by 


A, a es ‘bo. 

In short, the receiver has only to calculate the con- 
ditional mean #x[S(¢)] and pass this waveform through a 
matched filter to obtain ¢(7). Then, using its knowledge 
of A(r), it performs certain maximizing operations to 
obtain A, and %,. If S(é) is a T-second sample from a 
stationary Gaussian process,” Hx[S(t)] is equal to that 


S,(t) which maximizes the a posteriori likelihood p[S | X]. 
(In a Gaussian distribution, the mode equals the mean.) 


* The requirement that S(¢) must be a sample from a stationary 
process involves ignoring the radar “‘range-to-the-fourth-power” law 
for reasonable ensembles of target configurations. The assumption 
that S(t) is Gaussian seems reasonable if the proper rationalizations 
about s(t) and p[A] are made. In particular, for any s(t), S will be 
Gaussian if the A; are Gaussian. However, the relationship between 
the statistics of S(t) and A(t) for various sounding signals should be 
investigated. 
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Youla’ has shown that S,(t) = Ex{S(t)] is given by the 


following set of integral equations: 
{ 


| 8) = [ Xn eo 


Ti 


) i. Bee Re (i= a) dre eat) 28) 


where 
Rs(r) = autocovariance function of the random process 
of which S(¢) is a T-second sample 
and 
Ry(r) = (N(t)N(t + 7)), the autocovariance function 


of the noise. 


These two equations state that S,(t) can be obtained 
from a linear filter with input X(t) and impulse response 
h(r) where h(7) is the solution to a modified Wiener- 
Hopf integral equation. As the equations stand, h(r) is not 
physically realizable, but this situation can be corrected 
if we are willing to tolerate, at most, a delay of 7’ seconds 
in obtaining S,(¢). In the limit as 7’ becomes very large, 
the filter specified in (23) approaches the unrealizable 
case of Wiener’s least-mean-square-error filter. Finally, 
to obtain ¢(7), #x{S(é)] is passed through a filter matched 
to s(t) as shown in Fig. 1. 


Conditional Mean 


Filte- Matehed 


E,\St] 


Com puter to <#(t) 


Fig. 1—Processing for ¢(7). 


Special Case: n = 1 


When it is known that there is only one target present, 
the Bayes estimates for A, and 7, are obtained in a 
straightforward manner. 4 = 2° = (1) in the one-di- 
mensional case so that, from (20), 


(24) 


That is, we must choose an estimate 7,, which maximizes 
¢(r), and thus ¢°(7). Then, from (22) 


A 


Ay soe Pip: (25) 


Referring to Fig. 2, we locate the maximum of ¢(7) and 
equate the target strength with this maximum. Such a 
procedure using a matched filter operating on X(é), 
instead of on Lx[S(t)], has long been the accepted 
procedure. 


°—D. C. Youla, “The use of the method of maximum likelihood in 
estimating continuous-modulated intelligence which has been cor- 
rupted by noise,’ IRE Trans. on INrorMATION THEORY, no. 
IT-3, pp. 90-105; March 1954. 
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bie) 


Fig. 2—Bayes estimate for a single target. 


Special Case: n = 2 


lor two targets, we must make estimates A,,, Ap, 
#,,, and #,,. We first calculate 2° from X. 


Tha 
Nee 1 


where Aj: = d2, because A(r) is an even function. Invert- 
ing the above matrix yields 


(26) 


1 —ie 
2 mis 2 
A = 1 Ne If Ais (27) 
—his ll 
ieee ae 
so that from (20) 

2 ft 72 

Tee 2Ar2did2 + $2. (28) 


1— hz 


We must choose a 7,, and a 7,, such that the resulting 
Priv, 62, and dj2, Maximize J>,,. 7,, and 72, are the Bayes 
estimates of the target positions. Then, using @ and 
i, in (22), we calculate 


A,, = bu — ipo (29) 
12b 
and 
q hop ee Mero 
Ary z i Nee 


Eq. (29) is identical with the result of Helstrom* who 
discusses a similar resolution problem, except that Hel- 
strom uses X(t) as the input to the matched filter instead 
of Hx{S(t)]. 


Specialized Two-Target Configurations 


It is seen that a generalization of the simple matched- 
filter is optimum when two targets of unknown positions 
and amplitudes are present. Maximization of (28) in two 
dimensions is somewhat more complex, though, than 
looking for the maximum output of a simple filter. ¢,, 
and ¢., will not in general occur at relative maxima of 
¢(7) unless these relative maxima are very high indeed 
and are also separated by more than a correlation interval. 
(It is beyond the scope of this paper to discuss possible 
means of implementing a device for automatically com- 
puting 7,, and 7,,; we shall limit ourselves to the task 
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of specifying mathematically the optimum operations.) 
Let us study some limiting-case two-target configurations 
to gain some insight into properties of the optimum 
procedure. 

1) Two Targets Known to be Separated by More Than 
the Correlation Interval: If | rz — 7, | is always greater 
than the reciprocal of the sounding signal bandwidth, 
\i2 Will approach 0, and, from (28) 

Jo, = bi aie do. 
It is obvious that (30) can be maximized by choosing for 
$1, and ¢., the two highest peaks in ¢(7) which are sepa- 
rated by more than the correlation interval. Eq. (29), 
with i,:, = 0, is then used to compute AY and ader 
This technique is close to the way in which multitarget 
situations are handled by present-day radars. 

2) Two Targets Indistinguishably Close: If it is known 
that 7, & 7», and it is only needed to find the mean range, 
the two-target estimation problem reduces to the one- 
target problem. As 7, > 72, and thus 4,2. > 1, and ¢, > 4, 
(28) becomes 


(30) 


$3 (31) 


which can be maximized by finding the maximum of ¢(7). 

3) A Small Target Near a Large One: If A, > As, we 
may be justified in setting ¢,;, = max ¢(r), ignoring what 
we may later decide as estimates of 7, and the consequent 
Miz. Thus, 71, is approximately the 7 which maximizes 
(7). For any received waveform, this maximum will be 
some constant ¢,,. 

Regarding ¢, as a constant in (28), we may write 


[do aie A(42 = #is)bro)” 
1 oe (4, = 71) 


42 
Jat Oi 


J, a dip Gale (32) 


J,,, can be maximized by choice of 7, by selecting the 7 


which maximizes the quantity 


[(7) ia. PACT Tn A)] 
1 — (7 — fi) 


The above expression describes approximately what the 
receiver must do to locate a small target in the presence 
of a large one. First, the maximum of ¢(7) is found and 
its time of occurrence noted. The large target is guessed to 
be located at this point. Its effect is subtracted from ¢(7), 
and the result is squared and divided by [1 — \’(r — 7,,)]. 
This new waveform is then scanned in 7 for a maximum 
which occurs, say, at 7,,. Now that 7,, and 7,, are known, 
(29) allows us to calculate estimates for the target ampli- 
tudes. 


A MULTITARGET DETECTION AND 
ESTIMATION PROBLEM 


The Hybrid Loss Function and Its Minimization 


One important assumption that we have made in our 
development must now be extended. So far, we have 
assumed a known number n of targets, making the problem 
one of estimation rather than one of detection. If n is 
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random, we are faced with a combined detection and 
estimation problem, the solution of which will lead to 
optimum resolution systems as they were defined in the 
Introduction. We will also have to modify somewhat the 
loss function given by (6) so that extra penalties can 
result when the wrong number of targets is guessed. 

Let us consider the following loss function, written 


with explicit reference to the number of targets guessed — 


to be present and actually present: 


Te 
USA), 84,1 = | (84) - S@Pat+a, (3) 
0 
In the above equation, the notation is the same as that in 
(6). The added term a;; is the extra loss incurred for 
guessing 2 targets present when really there are 7. Let us 
compose a matrix from the components a;;: 


r 


Ao =Ao1 Ao2 


10 


(34) 


ee 
@ is an infinite matrix with elements a;; defined for all 
i=00,)1, 2,02 andy = OMe 

Proceeding exactly as in the previous section, we want 
to minimize the conditional expectation of the loss, given 
X(t). It is easily shown that this minimization is equiva- 
lent to maximizing the expression 


YS = dhe a E’x[an;] (35) 


by simultaneous choice of n, + and A; where J,, is given, 
for each n, by (10), and #x[a,;] = the expectation given 
X(t) of a,; over all possible numbers of targets j. 

The following procedure is used to maximize K,,. First, 
for each n, Jn, = by -(A5'by) is found by selecting an 
n-dimensional vector =, such that the resulting ¢, and 
4,' maximize the n-quadratic form $-(47'6). Then we 
calculate 


Ex{a,;] = aopx(0) + an px(1) 
ae ay as QniPx(j) =f ae 


=! (0,9) Oe (36) 
where 
OQ, =a Hilbert space vector with components ayo, 
Ani; An2; 3 
Px = a Hilbert space vector with components px(0), 
px(1), px(2), pM 
and 
Px(t) = the a posteriori probability that 7 targets are 


present given the received waveform X(t). 


By using J,, instead of J, in (35), we have the maxi- 
mized expression 


(37) 
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vhich must be further maximized by choice of ”. If it 
jvere not for the term «,-px, K,, would have no maximum 
ver n because it can. be shown that J,,.,, 6 > J,, for all 
hb. In practice, the highest value of n to be tested will 
pe some finite number, probably two or three, so that 
ll the K,,, could, in principle, be calculated and the largest 
ound. Once 7 is determined, the ”-vector 7, gives the 
ayes range estimates, and the corresponding » and 
.) are used to calculate A,. 

Let us now review the procedure for optimum estimation 
und detection in a multitarget environment: 

1) The receiver calculates ¢(7) and px from the received 
waveform X(t). (7) is again the matched-filtered version 
f Hx[S()]. The calculation of Hx[.S(t)| for the case of 
random 7 will be discussed later in greater detail. px is a 
vector whose 7th component is just the a posterior? prob- 
bility that 7 targets are present. 

_ 2) From ¢(7) and (7) the receiver finds, for each n, 
that by and 3,' which maximize the n-quadratic form 
b-(A 'd). The maxima of the n-quadratic forms are called 
yar 

3) The quantity K,, = Ji, — @,Px is the then calcu- 
lated and maximized over n. That is, if K,, > K,, for 
all integers n # *”, then 7” is an optimum or Bayes esti- 
mate for the number of targets present. 

4) The range estimates of the ”% targets are then the 
components of the f-vector +, which determined the 
maximizing }, and 2; for the ”th dimension. 

5) The /-dimensional vector @, and the 7 X / matrix 
2, are then used to calculate the Bayes amplitude esti- 
mates A,,, --- Aj, of the ” targets by the expression 
A, = 3,'b,. The above procedure completes the com- 
bined Bayes detection and estimation technique. For 
the loss function assumed, it provides optimum resolu- 
tion. Special examples of this procedures will be considered 
below. 


Calculation of Ex[S(t)] 


When the number of targets n is a random variable, 
the method of Youla® cannot be directly applied to the 
calculation of Hx{S(t)]. When n is random, we may not 
assume that the a prior? probability density function for 
the waveform S(t) is a multidimensional Gaussian dis- 
tribution if there is a finite possibility that S(é) = 0, 
j.c., no targets. If the a posteriori probability for zero 
targets, given X(t), is px(0), and if otherwise p[S] and 
p[S | X] can be considered multidimensional Gaussian 
likelihood functions, then 

Ex{[S()] = (1 — px(0)JSo (38) 
where S,(t) is that S which maximizes the “continuous” 
yr Gaussian portion of p[S | X]. S,(t) can be obtained by 
the same linear filter described in (23). 


10 For any n, a Jnyi Which equals J,» can always be obtained 
xy choosing two of the 7; to be equal in the (n + 1)-vector 7 to 
nake it in reality an n-vector. Thus, Jnii,. 2 Jno. 
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Some Special Cases 


Let us now consider some examples to illuminate the 
combined detection and estimation theory that has been 
developed. First, we must select an appropriate matrix a. 
Following somewhat the philosophy of Bennion,’” who 
uses a loss function which penalizes a given amount for 
false alarms but penalizes only with the squared error 
for false rest, we write 


0 oO © © 
il © © @ 

Cael it O @ (39) 
a 2 tb @ 


That is, we lose an extra amount 3a if we say 3 targets 
present when there are really none, ete. 

1) Detection of a Single Target: Suppose we need only 
choose the larger of Ko,, or K,,, and, if K,,, is our choice, 
make estimates of the single target’s range and amplitude. 
Such a situation is a combined single-target detection and 
estimation problem. From (10), guessing no target present 
implies J,,, = 0, so that 


Koy a 


—aQ,°: px 
and 


K,,, = max ¢(r) — q:px. (40) 
We must choose the maximum, which leads to the rule: 


say a target is present if 


K,., — Ko. = {max ¢(r) — [a — Crist Bail) (41) 
or if 


max ¢°(7) > apx(0) 


otherwise say no target is present. If a target is declared 
present, 7,, and A,, are calculated as before. Notice in 
this example that the combined detection and estimation 
problem involves essentially a threshold detection scheme 
even though the value of the threshold is a function of 
the received waveform [through p,x(0)]. 

2) Pairwise Resolution: To answer the question: ‘‘ Are 
there two targets present or just one?” merely involves 
choosing between K,,, and K,,. Now the J,,, and J,, 
are calculated using ¢(7) and X(r). If 


Jo4 = Ji,» = al x(1)] (42) 


then say two targets are present, but if not, then say 
only one target is present. Estimates for A; and 7; can 
then be made. If a small target is to be detected in the 
presence of a large one, the procedure under (32) may be 


1D. Bennion, “Some Results in the Estimation of Signal 
Parameters,” Stanford Electronics Lab., Stanford University, 
Stanford, Calif., Tech Rept. No. 10; September 10, 1956. 
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used to obtain the rule: say two targets are present if 


wACr = Fi) ] > apx(1) (43) 


otherwise say only one target present. The estimates for 
A, and 7; are obtained as before. 

3) Widely Separated Targets: If it is known a prtore 
that any targets will be widely separated, X is an identity 
matrix as is} '. In this case, J, = Ee ¢., and 


Ki. = ES bis — U% Px: (44) 
The following procedure may be used to maximize K,,,, by 
choice of n: 

a) The receiver computes ¢(7) and finds its maximum, 
$1. If ¢}, > apx(0), a target is announced and its param- 
eters are computed. 

b) The next highest peak in ¢(7) which is more 
distant from 7,, then a correlation interval is selected as 
gx». If 3, is greater than a[px(0) + px(1)], then a second 
target is announced and its parameters are computed. 

c) This process is repeated until the threshold con- 
dition is not satisfied, giving a decision of, say, / targets 
and their parameters. 


Discussion 


We have proposed..an optimum technique for simul- 
taneously performing the generalized detection (how many 
targets) and the estimation (where are they and what 
are their amplitudes) problems. The detection problem 
involves the maximization of a set of quadratic forms 
(n = 0, 1, 2, ---). From each maximized quadratic form, 
a constant is subtracted which is a function of the a 
posteriort probability of a certain number of targets being 
present. Then, the resulting terms are compared to find 
the largest. We have assumed a modified squared-error 
loss function which inflicts extra penalties for false guesses 
about the number of targets present. 

The receiver must compute the conditional expecta- 
tion of the composite echo signal as well as the a posteriort 
probabilities for various numbers of targets being present. 
The conditional expectation of the echo signal can some- 
times be obtained by a modified type of Wiener filtering 
operation on the received, noise-corrupted, waveform. 
If no assumptions about a priort echo statistics are made, 
the received datum (signal plus noise) is usually used in 
place of the conditional expectation of the composite echo 
signal. Such practice is a consequence of using a maximum 
likelihood procedure instead of a squared-error Bayes 
procedure. 

So far, we have concerned ourselves only with point 
targets which cause A(r) to take the form of delta func- 
tions. If optimum estimates of A(7r) for distributed 
targets, such as clutter, are to be found, a somewhat 
different tack must be taken. Now we shall discuss some 
results for this important problem. 


DISTRIBUTED TARGETS 
The Turin Filter : 

Suppose A(r) does not consist of point targets, but, | 
instead, represents distributed targets such as clutter. ; 
Then a loss function involving the integrated squared ; 
error between A(r) and the estimate A(r) might be; 
appropriate. Such a problem was considered by Turin,” 
although he restricted his estimator to a linear filter.” 
Under the assumption that. A(r) is a T-second sample 
from a stationary Gaussian random process with auto- | 
covariance function R,4(7), the transfer function of the 
optimum linear estimator, as derived by Turin is 


Te) = ie) Glo) aa 
Fw) F*(w) + G, () 
where 
F(w) = Fourier transform of s(t) 


F*(w) = conjugate of F(w) 
Gy(w) = power spectral density of N(¢) 
= Fourier transform of Ry(7) 
G4(w) = power spectral density function of the random , 
process of which A(t) is a T-second sample 
= Fourier transform of R4(r7). 


Special Properties of the Turin Filter 

The filter shown in Fig. 3 is a “crispening filter.” 
) 
| 


X(t) F*(w) A(t) 
*( Gul) 
F(w)F*() + CE) ) 


4 


Fig. 3—Turin filter. The optimum linear estimator for the target 


density function. 


(We disregard the extra time delay needed to insure 
realizability.) The F*(w) in the numerator ‘‘compresses’’ 
the received waveform by causing phase reinforcement, 
while the denominator accents the high-frequency com- 
ponents allowing for fast risetimes. For example, if the 
average noise power is much less than the average echo 
power, the Turin filter approaches an inverse filter. An 
inverse filter has a delta function output every time s(é) 
occurs at the input. On the other hand, if the average 
noise power is much greater than the average echo power, 
then the Turin filter becomes a simple matched filter. 
No rise-time crispening can be done in the face of so 
much noise. In any case, if | F(w) | is rectangular and if 
the noise and A(r) are white, then the filter also becomes a 
matched filter. 


2G. L. Turin, “On the estimation in the presence of noise 
of the impulse response of a random, linear filter,’ IRE Trans. 
ON INFORMATION THEORY, vol. IT-3, pp. 5-10; March, 1957. 

‘8 Which is no restriction under the assumption that A(z) is 
Gaussian. 
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The resolution enhancing properties of this filter come 
bout due to its crispening action. The lower the noise- 
to-echo power ratio, the greater is the crispening action, 
ia hence, the finer is the potential range (time) resolution 

apability. It is assumed that @y(w) is known, and G4(w) 
ran be calculated from the statistics of the ensemble of 
target density functions. 


CONCLUSION 


The problem considered in this paper, briefly stated, 
concerned optimum methods of signal detection. The 
signal form, however, was not known exactly except for 
the fact that is was a composite of known signal forms 
echoing from a random configuration of targets. We 
stated in the Introduction that the solution of such a 
detection problem also solves the optimum resolution prob- 
lem if resolution is defined to mean distinguishing between 
the different possible composite echos. Various procedures 
were derived for making estimates of the target density 
function (which may either be discrete or continuous) 
under special assumptions regarding a prior? probabilities 
and loss functions. 

No attempt was made to evaluate different radar 
systems, including optimum ones, to determine just how 
well they are able to distinguish between different target 
configurations in the face of random noise. This problem 
was posed as question 2) in the Introduction and must 
be answered before quantitative statements can be made 
about resolution ability. Also unanswered is the problem 
of deciding what sounding signal to transmit so that 
resolution ability can be further enhanced. It is hoped 
that the present study might provide stimulation for a 
complete set of answers to these questions. In addition 
to the above two important problems, the following 
suggestions are offered as topics for further research: 


1) Investigation of the a priori and a posteriori statistics 
of S(t) for various assumptions about the statistics 
of the target configuration, the statistics of the 
noise, and the known sounding signal. 

2) Generalization of the results of this paper to include 
the other spatial dimensions (angle) and velocity. 

3) Investigation of other loss functions. 

4) Instrumentation of a two-target resolver using (28) 
and measurement of its performance in noise. 

5) Analysis of methods to calculate approximately the 
a posteriori probabilities for the number of targets 
present. 


APPENDIX 


List oF SYMBOLS 


A, A(r) = Target density function 

A, = Amplitude of zth target 

A = Target amplitude vector whose 7th component is A; 
@ = Set of all possible A(z) 

E,{ ] = Conditional expectation of [ ] 
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F(w) = Fourier transform of s(t) 

G4(w), Gy(w) = Power spectral density functions of A (r) 
and N(t), respectively 

Joy Jny = The quadratic form $-(4~'6) and its maximum, 
respectively 

K,.. = An expression derived from J/,, by subtraction 
of @,: px 

= The loss function defined for each combination of 
[A(r), A(z)] or [S(A), S(O] 

n = Number of point targets present 

N, N(t) = Zero-mean, stationary, Gaussian noise 

pl |, pl | X] = Likelihood and conditional likelihood, 
respectively, of the waveform [ | 


px(t) = Conditional probability of 7 targets present 
given X 

px = A Hilbert space vector whose 7th component is px(2) 

Ra(t), Ry(r), Rs(r) = Autocovariance functions of the 
stationary random processes A(r), N(t), and S(@), 
respectively 

s, s(t) = Sounding (transmitted) signal form 

S, S(t) = Composite echo signal consisting of a linear 


superposition of weighted and delayed s(t)’s 

S = Set of all possible S(t) as determined by @ 

T = Maximum possible target range (delay time) 

X, X(t) = Received datum = S + N 

« = A loss matrix with terms a;; being the extra loss 
incurred for guessing 7 targets when there are really 7 
targets 

aw; = A vector whose jth component is a;; 

6(r) = Dirac delta function 


Mr) = Jo s(s(t + 7) dt, “autocorrelation function” 
of s(t) 

4 = A matrix consisting of terms A,;; = (7; — 7;) 

3 ' = The inverse of X 


¢, (7) = Matched-filtered version of Ex[S(t)] 

> = A vector whose 7th component is ¢; = $(7;) 

7, = Range (delay) of the 7th target 

The symbol ~ is used over a parameter to indicate an 
estimate of that parameter; for example, A; is an estimate 
of A,. The subscript 6 is used with an estimate to indicate 
a Bayes estimate; for example, A,, is the Bayes estimate of 
A,;. These symbols are also used in conjunction with terms 
like ¢ and \ when they are evaluated using Bayes estimates. 
For example, ¢;, = $(7;,). The symbols () are used to 
indicate the (unconditional) expectation of the enclosed 
quantity. 
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Summary—A method of generating phase shift pulse codes 
of arbitrarily long length with zero periodic correlation except for 
the peak for zero shift is presented. The codes are of length )’ 
where p is any prime number, and p different phase shifts cor- 
responding to the pth roots of unity are necessary to generate them. 
Since p different phase shifts are required, these codes are not as 
easy to generate and process as the binary codes, but this does not 
seem to be a serious limitation to their usefulness. Application 
of these codes can be made as interpulse phase modulation for 
range resolution in pulse Doppler radars or for a method of syn- 
chronizing a pulse code communication system. 


INTRODUCTION 


INARY codes’ have been considered for modulating 
pulse trains at a carrier frequency for communica- 
tion and radar systems. For example, a sequence 

+1, +1, —1 can correspond to either of the transmitted 
waveforms shown in Fig. 1. There is no reason to restrict 
the codes to +1 and —1 (7.e., 0° and 180°) except for the 
added complexity in generating and processing for a 
number of different phase shifts. 


PULSES OF RF 


o° o° oe 


ec 


<= ote 7 — so 


Fig. 1—Modulation waveforms for the code sequence +1, +1, —1. 


Both single or repeated code groups can be considered. 
In a radar application, for example, a single code group 
can be used for pulse compression, or the entire time of 
observation of the target may be occupied by one group. 
On the other hand, many repetitions of a code group may 
occur during the time of observation, and therefore, the 
behavior of these repeated codes is of interest. lor ex- 
ample, in a pulse Doppler radar, a code group may occupy 
an unambiguous range interval and be periodic with this 
interval. If this signal is then processed by a filter matched 
to a single code group and Doppler shifts are not con- 
sidered, then the time or range response of these repeated 
codes is given by the periodic correlation function which is 
discussed in the next section. In the case of a search radar, 


* Received by the PGIT, January 11, 1961; revised manuscript 
received, June 9, 1961. 

+ Aerospace Group, Hughes Aircraft Co., Culver City, Calif. 

1 Binary codes consist of sequences of plus and minus ones. 
+1 corresponds to 0° phase shift of the carrier wave and —1 to 
180° phase shift. Some examples, as well as a discussion of means 
of generation and processing are given by G. L. Turin, ‘“‘An intro- 
duction to matched filters,’ IRE Trans. on INFORMATION THEORY, 
vol. IT-6, pp. 311-329; June, 1960. 
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there will be ‘‘end effects’ due to the finite observation 
time or to the antenna scan pattern, but these effects will’ 
be small if a large number of code groups occur during’ 
the observation time. 


PERIODIC CORRELATION 


A modulation code of length n can be expressed as a 
sequence {@o, @, G2, +--+ , @,-,:}. The periodic (also called 
serial or cyclic) correlation function is defined as the 
sequence {2% , 1, Y2, °°: , Xn-1} where 


A 

* 
» Ay + iy - 
k=0 


Note that d,+m a,, and the asterisk denotes the 
complex conjugate. The modulation is expressed in com-} 
plex form (7.e., a phase shift of @ radians is written e*’*).: 
For 7 = 0, the value of x; assumes its maximum: 


n—-1 


say = De | ay le 
k=0 


For0 <7 <n — 1, the values of x; should be low; in fact, 
zero if possible. 

Note that this definition of the correlation function 
gives the values only for integral shifts. If the pulses form. 
a contiguous signal, then the value for an intermediate. 
shift 7 + 6,0 < 6 < Lis: 


tial n=1 


Ee a == az 6) > homeas aim Do Qsiviat 


k=0 


sl 0) eae Os 416 | 


If the values of x, = 0 for0 <7 <n — 1, then the correla- 
tion function looks like that shown in Fie: oe 


Pee 1 ae Sey 


oe 


Fig. 2—Periodic correlation function for code group of length n with 
zero correlation except for peaks. 


Elspas’ has considered the generation and properties of 
binary codes (7.e., a = +1) with the specified periodic 
correlation function {n, —1, —1, , ~1}, anda 


a YY. Elspas, “A Radar Based on Statistical Estimation and 
Resolution Considerations,” Electronics Labs., Stanford University, 
Stanford, Calif., Tech. Rept. No. 361-1; August 1, 1955. 
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tabulates some of these codes up to length 127. Tompkins® 
has derived some properties of ternary codes CHa rimees ae 
0, or —1) with x; = Ofor0 <7 <n — 1, and he tabulates 
these codes up to length 18. In these studies, applications 
to radar systems are considered and the effects of Doppler 
shifts are investigated through the use of the “ambiguity”’ 
function. 


MernHop oF GENERATING’ THE CopEs 


Codes which have the value of «; = Ofor0 <i<n-—1 
can be generated in the following way. A prime number p 
is chosen greater than one. There are then p different pth 
roots of unity 1, &, &, --- , &-, (@.e., roots of the form 
e*°"*” for 0 < k < p — 1). Sequences of length p are 
formed as follows: 


1 — 
2 = 
ie £1, igi ) al 
ib £5, os. Fete 
2 = 
ees ah Om Ge ea 


The code sequence is then formed by putting down in 
order all the first terms of each sequence above, then the 
second terms, then the third, ete. The resulting sequence 
of length p”* is then: 


t; - Os > ely ifs esis pe ONS eee ie Wasa RT ce 
ie eyes 6) ee es, oe ree) see 


In Appendix II it is proved that this sequence has zero 
periodic correlation except for the peaks at 7 = 0, p’, 2p’, 
etc. The order of the p sequences from which the final 
sequence is formed can be changed, and also any cyclic 
permutation of each sequence can be substituted for each 
sequence before constructing the code sequence without 
altering this result. 

A larger class of similar sequences of the pth roots of 
unity of length m = p” — 1 where n is a positive integer 
and p is prime have been described by Zierler.” These 
sequences have the specified periodic correlation function 
{m, —1, —1, --- , —1} and can be generated by a linear 
shift register of n stages with each stage having p states. 


Example: p = 3 


The cube roots of unity are 


1 ene?) 


—j (27/3) 
) »€ i 


The preliminary sequences can be 


tele] 


1 Ps 


—7 (27/3) 
) ) 2 


1 ere. 


+7(27/3) 
) »€ ¥ 


3D. N. Tompkins, ‘‘Codes with Zero Correlation,” Engrg. Div., 


‘Hughes Aircraft Co., Culver City, Calif., Tech. Memo. 651; June, 


1960. 

4 Generating in the mathematical sense. 

5.N, Zierler, “Linear recurring sequences,’ J. Soc. Ind. and Appl. 
Math., vol. 7, p. 45, example 2; March, 1959. 
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The code sequence is 


—i (27/3) 1 


~i (27/3) 
?) € : ‘ 


+7 (27/3) 
) 7 € é 


ie i i iit ag: 
This sequence also has good nonperiodic correlation 
properties and is one of the sequences of the cube roots 
of unity which were found by DeLong and which have 
“optimum” nonperiodic correlation functions. ° 
The preliminary sequences can also be 


hole 

+7 (27/3) —7 (27/3) 

e j(27 ,e i(27 1 
PANS AA?) il pt As 


) ) 


In this case, the code sequence is 


il ees) en gee) il 


—j (27/3) ~j (20/3) 
d ) d é d i Wy Me € : 


The patient reader can verify that these perform as 
advertised. 


CONCLUSIONS 


Codes of arbitrarily long length with zero periodic 
correlation except for the peak can be generated by the 
above method in not much more time than it takes to 
write down the sequence. These codes exist for every length 
n = p where p is a prime number. Moreover, many 
essentially different codes exist for each such n. p different 
phase shifts are required rather than just two as in the 
binary code case. 

Further work can be done to determine how many 
essentially different codes exist for each prime number. 
Also, the ‘‘ambiguity’”’ function of these codes should 
be investigated to determine their utility as a modula- 
tion for a pulse Doppler radar. For synchronization in a 
communication system, two code groups can be trans- 
mitted in a row and the received signal correlated with a 
single group. Synchronization pulses will then be generated 
as shown in Fig. 3. 


es SH fee eae 


TIME 


Fig. 3—Time synchronization pulses for communication system. 


APPENDIX [ 


Notation and Preliminary Remarks in the Deriwation 
The expression 
A = 1 ao, Qa, aA 3) Che 


means A is the sequence d, 4, (eerie lie den «= 
{bo, b:, +++ 6,1}, then A X B stands for the periodic cross 


6]. F. DeLong, Jr., ‘“Three-Phase Codes,” M.I.T. Lincoln 
Lab., Lexington, Mass., Group Rept. 47:28; July 24, 1959. 
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correlation function of A and B expressed as a sequence 


of length n. That is: 


A x B = 1205 Li, hires Ln—1 
where 
tC; = Dae iD WhETC Gnim = Un- 
Also 
A+B= {Qo = -bo, a1 b,, mt An-1 ab Ores 
and 


At Ote Oto 


where a* denotes the complex conjugate of a. The periodic 
correlation function of a single sequence A is defined as 
An eas: 


The Roots of Unity 


The following remarks as well as Theorem I-1 are well- 
known results concerning the roots of unity.’ The Ath 
roots of unity in the complex number field are 


i (2m/h) 


72(27/h) 
l,e ne 


i(h-1)24/h 
) 7 € ° 


The Ath roots of unity form a cyclic group under multiph- 
cation. There are h distinct hth roots of unity. If h is a 
prime number, then all the roots except +1 are primitive 
roots. The following theorem is used extensively in the 
derivation. 

Theorem I-1 


If € is a Ath root of unity, then: 
1+e+PtHP4t--- +27 =h if E=1 
OF it Fe seN le 


Also note that é"** = &’. 


APPENDIX II 


Derivation of Results 


Let p be any prime number different from one. The 


pth roots of unity are denoted by &, &, --° &, -*° , &-1 
(no particular ordering of the &’s is implied). Consider 
sequences of the form 
C; = we oy fi, i MELE ls 
Theorem II-1 
Ca C2 =F) 08 05072 -5:0} 
ia ee 
i {p, Dén, DE, +, pe, } 
if & = &° 
TE. B. L. van der Waerden, ‘‘Modern Algebra,’’ Frederick 


Ungar Paplehiie Co., New York, N. Y., vol. I, pp. 111-115; 1953. 
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(Note that &* = 
ae number field. Therefore, £, # &," 


* since the &,’s are roots of unity in the 
if and only if 


C,, ~ C*, and é,, = & 7 it and only iC. 
Proof 
C3 x Ge = {igs H1,X2, °°", Ce 
where 
p-l 
De bauer 
k=0 
Dieelh 
= & DEE 
k=0 
A Dim ~ 
= & Dy nfm) 
k=0 
lh &, = &"; then-€e,0 =) and 
Ii é, + £5 thene2,, = £) = land 


p-l 
C05. = gE, DE f, = 
k=0 
from Theorem I-1. 


Corollary to Theorem II-1 


Let D; be any cyclic permutation of C;. Then 
DieXeD = Ca 
\D, Des; pen, siete pe} 
and D, X D* = {0,0, --- 0} for m 4 n- 


Theorem II[-—2 
If u is an integer, 0 < u < p — 1; then 


Proof 


If &, ¥ é,, then at least one of the é’s is different from 1. 
Say é, ~ 1. Then é,, = & for some integerv, 1 <<v<p—1 
ory = 0. 


ee SH) SS) 
Since the é’s are the pth roots of unity where p is prime, 
& is a primitive root of unity, call it &. Then & + &, 
since a primitive pth root of unity raised to the vth power 


(1 <v < p — 1) generates a different primitive root of 
unity and & = 1 + &,. Therefore: 


se) =) 


Therefore, all the &;’s are different pth roots of unity since 
all the &,’s are different pth roots of unity. The sum of all 
the pth roots of unity is zero from Theorem I-1. 

Now that these preliminary results have been proved, 
the proof of the main result can proceed. That is to prove 
that the codes generated by the methods outlined pre- 
viously do have zero periodic correlation except for the 


1961 


main peaks. The code sequence of length p” is formed by 
interleaving p sequences of length p, which are arbitrary 
cyclic permutations of the C;’s defined at the beginning of 
this Appendix. The code sequence is 


el 
D= Wor iss Von PO 2 Yp2-1} 
~and the sequences D; from which it is formed are: 
a 
Do = (Yo; Yo) Yon, °° *; Yn2—p} 
— ae) 
D, a Yi) Yn+1) Yont+1) pe kes] Tepes writ 
= 
ore a VGp—15 Yoni Yen aly ae dec | Viet We 


All the members y,; of sequence D are defined from the 
fact that each D; is a specified cyclic permutation of the 
corresponding C;. 

The correlation function of D can be expressed in terms 
of the auto- and cross-correlation functions of the se- 
quences D,. 


D < 1D = pounce: ae 


*y &p?-1 } : 


Tung and Schwarz: Optimum Nonlinear Filters for Quantized Inputs 2510 


Consider the subsequences of length p: 


Geis Zesty Straps Caen teph 
fori0. <n 
= ET 
= 40°03 10) iene 
C= by {D; Dim; Dem, Dem } 


from the corollary to Theorem II-1. 


D1 Dl p-l 
pte Dy pee ee 


m=0 m=0 m=(0 


Go a 


(p00 ao) 


from Theorem [I-2. 
Therefore: 


DX D* = {p, 0,0, --- 
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Summary—Optimum least-square filters belonging to Zadeh’s 
nonlinear class St; are considered. Attention is restricted to those 
systems whose present output is influenced only by a portion of the 
past input. The input signal consists of a message and noise, both 
of which are stationary random processes. It is assumed that the 
amplitude of the input time series is bounded and takes on discrete 
values at all times. This assumption leads to a nonlinear filter 
which can be realized as a quantizer or amplitude selector followed 
by a parallel set of linear filters. The system becomes optimum 
when the impulse responses of the linear filters satisfy a system 
of integral equations of the Wiener-Hopf type adapted to finite 
memory filters. By virtue of the assumptions made concerning the 
joint probability density functions of the message and noise proc- 
esses, it is found that the Fourier transforms of the kernels of 
these equations are rational functions. A method is developed for 
the solution of this set of integral equations. This method is illus- 
trated by an example, and the mean-square error of the nonlinear 
filter so obtained is compared with the best linear filter. 
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I. InTRODUCTION 


UBSEQUENT to the classic work of Wiener,’ the 
S subject of optimum linear filtering and prediction 

has been extended in many directions by Zadeh and 
Ragazzini and many others. Inasmuch as a linear filter 
is a degenerate case of a nonlinear filter, improved results 
can usually be obtained by using a nonlinear filter. How- 
ever, due to the incomplete knowledge of characterization 
of nonlinear systems as well as the inherent difficulty 
involved in an analytic treatment, relatively little work 
has been done with nonlinear filters.*-° In addition, we 


IN. Wiener, “Extrapolation, Interpolation and Smoothing of 
Stationary Time Series,’”’ John Wiley and Sons, Inc., New York, 
IN; Y-3 1950: 

2L. A. Zadeh and J. R. Ragazzini, “‘An extension of Wiener’s 
theory of prediction,” J. Appl. Phys., vol. 21, pp. 645-655; July, 
1950. 

3 J. H. Laning, Jr., “Prediction and Filtering in the Presence 
of Gaussian Interference,” Instrumentation Lab., Mass. Inst. Tech., 
Cambridge, Tech. Rept. R-27; October, 1951. 

4R. Drenick, ‘‘A nonlinear prediction theory,’ IRE Trans. on 
INFORMATION THEORY, vol. IT-4, pp. 146-162; September, 1954. 

5L. A. Zadeh, “Optimum nonlinear filters,’ J. Appl. Phys., 
vol. 24, pp. 396-404; April, 1953. 
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often find in practice that the amount of statistical data 
necessary for the design of nonlinear filters far exceeds 
what is available. This paper, therefore, considers a class 
of nonlinear systems whose optimization requires only 
second-order statistics. 

The relation between the input z(t) and the output 
y(t) of the class of filters to be considered here is expressed 
by 


i Tae: ae eae (1) 


Jo 
where T is the memory of the filter. Systems characterized 
by (1) have been designated by Zadeh’ as class 9t. In 
his paper, Zadeh has derived a sequence of integral equa- 
tions for a class of optimum filters and has shown that as 
the filter structure becomes more complicated, more and 
more information is necessary about the statistics of the 
input time series. It is found that only the second-order 
probability density functions of the message and the 
noise processes are necessary for obtaining the filter 
belonging to class Yt. 

Since data supplied by computers and devices using 
digital readout are quantized, this information should be 
incorporated into the design of optimum filters. It is 
shown in this paper that this information can be used 
fruitfully by adopting the class of filters given by (1). 
The conventional mean-square error criterion is used. 

Section II discusses the assumptions made which lead 
to the structure of the nonlinear filter in the form of a 
quantizer or amplitude selector followed by a parallel 
set of linear filters. A set of integral equations for the 
optimum impulse responses of these linear filters is ob- 
tained. The equations are of the Wiener-Hopf type, but 
are formulated for filters with a finite memory of 7 
seconds. In Section III, a method is developed for solving 
these integral equations. Use of the method for infinite 
memory filters is also discussed. Section IV gives an 
example of pure prediction in which the method is applied. 
The improvement in mean-square error of the nonlinear 
filter over the best linear filter is then obtained. 


TI. MATHEMATICAL FORMULATION AND THE STRUCTURE 
OF THE NONLINEAR FILTER 


Let the input signal a(t) be the sum of two independent 
discrete-amplitude, wide-sense stationary random proc- 
esses, namely, the message m(t) and the undesired noise 
n(t), 


a(t) = mb) + nd. (2) 


The problem is to find a filter belonging to class Jt, such 
that the difference between the actual output from the 
filter and a desired output is minimized in some sense. 
Let gim(t + a)|(a => 0) represent the desired output 
where g(-) is an odd function of its argument, and let 
e(t) be the error between the desired output and the 
actual output y(2), 


e(Z) = y(%) — alm(t + a)]. (3) 
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As in Wiener’s theory," the least-mean-square error 
criterion will be used. For covenience, the ensemble 
average of ¢(¢) is assumed to be zero, so that the optimum 
predictor is that which minimizes the variance of €(t). 
Let us denote the ensemble average by ( ). The filter is 
therefore optimum when 


(4) 


(e(t)) = minimum 
and 


(e(Z)) = 0. (5) 


The following notation is used for convenience: 


Ly = (be ay) 
Xe = x(t — tz) 
Mo = m(t + a) 
Pm(Ma) = prlm(t + «)| 
= first-order probability density function 
of m(t + a) 
Dac(41,; 03) = palel— mm) et — 2s eee 


= joint probability density function of 
a(t — 7,) and x(t — 72) 
Pem(41, Ma) = Denlx(t — 71), M(t + a); 7, + al 
= joint probability density function of 
x(t — 7,) and m(t + a). 


By the usual technique of variational calculus, we find’ 
that a stationary solution occurs when the kernel K[22, 72] 
satisfies the integral equation 


iL Ce lpea( te M a) dm 4 


(oo) Si 
= / / Rls, r2\poalte ae) da, des CO) 
— OO) (0) 
0 << T1 < he 


It can be readily shown that this stationary solution does, 
in fact, give minimum mean-square error. The minimum 
mean-square error is given by 


Min (0) = almad?) — (([" Blea rd dre) )- 


Since m(t) and n(t) are discrete-amplitude processes, 
the joint probability density functions p,,(%,, v2) and 
Prm(X1, M.) can be expressed in terms of delta functions. 
Let the message m(t) have 2M amplitude levels symmetri- 
cal with respect to zero and designated by b;, and let the 
input signal x(t) have 2N amplitude levels symmetrical 
with respect to zero and designated by c;(N > M). Then 


Data ) Lo) 


N N 
= A; ;(72 ay 
N 


i=—-N j=-1 


71) O(@, — ¢;) 6(@, (8) 


Wy C:), 


Lp ereeO) (¢; = —c_,) 
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nd 
ae Ma) 
= 22) oy B; (71 + a) d(@, — ¢,) 6(m, — b,), (9) 


i,j ¥0 


where 6(-) is the Dirac delta function. Further, since the 
nput signal x(t) can only assume 2N discrete values, it is 
seen that the kernel A in (1) depends, during any interval 
n which z(t) assumes a value c;, only on c; and r. 
Hence, AK is completely specified by the 2N functions 
(K(c;, 7); 7 = +1, --- , +N}. The nonlinear filter there- 
ore can be represented by the structure shown in Fig. 1, 
where K(c;, 7)/c; are the unit impulse responses of linear 
alters. The system becomes linear when 


ands: Az (10) 


forall ls 35 


K(;.,, 7) 5 Kies) 


C; Cy 


AMPLITUDE 
SELECTOR 


x(t) 


Fig. 1—Schematie representation of the nonlinear filter. 


Substituting (8) and (9) into (6) and equating the corre- 
sponding coefficients of the delta functions associated with 
v,, we find that the system becomes optimum when the 
set of functions K(c;, rz) satisfies the following system of 
integral equations: 


M 


s qb) Bi s(n Le) 


j=—2 


LA Ga = FAC. T>) dia, 0) =< 7 << WB 


See) (11) 

For arbitrary A;,;(r2 — 7) and B,,;(71 + a), the 
simultaneous set of integral equations in (11) is too 
formidable for any analytic solution. We shall now make 
two reasonable assumptions regarding the probability 
density functions so that a direct solution can be found. 

1) The joint probability density functions p,,, of the 
message and p,,, of the noise are symmetrical with respect 
to their arguments as well as symmetrical with respect to 
the origin. 

2) For all 7 and j, the quantities (ASG — 71) — 
A; _;(t2 — 7)| can be approximated as the sums of a 
finite number of decaying exponentials. This is the logical 
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extension of the usual assumption that the autocorrelation 
function, which is given by 


ESGe = 7) = COS) = DS D ci; A ii(t2 = Ty 


may be approximated by a sum of exponential functions. 
We will now show that requiring the expected value of 
the error to be zero implies that 


KG) ae (12) 


From (5) and the fact that the ensemble mean of q(-) 
is zero for all ¢, we conclude that 


(yD) = 0; (13) 


consequently 
ae 
| (gee (14) 
0 


Let f; denote the probability that «(tf — 7) takes on the 
value c;. Then in view of assumption 1), 


fate GHL% NM. OB) 
Eq. (14) becomes 
[> iik@. 54 ke emo 


In order for (16) to hold for any set of f; satisfying 
j-1 2f; = 1, itis necessary that 


Ke, T) a Kies, T) ==) 


which establishes (12). 

As a result of assumption 1), it is seen that p,,(v, 2:) 
is also symmetrical with respect to its arguments and with 
respect to the origin. On the other hand, p,,,(11, m.) is 
only symmetrical about the origin. In addition, it is 
shown in Appendix I that p,,(7,, 22) is an even function 
in the variable tr. — 7,. In terms of the coefficients in (8) 
and (9), these consequences of assumption 1) can be 
expressed as 


A,, (| Fy UD [) = A;,.(| Lil, = UP) l) 


a Aa it ae) 1), (18) 
Be Ge a= es Gea (19) 

Let us define 
a(n ta) = 3) Boil +a)qb). (20) 


Since ¢(-) is an odd function, it follows by using (19) that 


2;(7 + @) == —2.4(ty + a), G= eee Sol) (21) 
Eq. (11) can now be written 
N on 
alnta)= of Aa — 2D 
7=1 40 
a) Aen 1 Ss UD) )]-Kte;, T2) dt2, 
Qe els (ee LED ee ee (22) 
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By using (18) and (21), it is seen that N of the 2N equa- 
) are actually redundant. Let us replace 7, 
and let 


tions in (22 
and 7, by t and 7, respectively, 


We a(| & <7 |) = Ay — 7 | A veal ne 
Ri hice as (24) 
Eq. (22) finally reduces to 
N Hi F 
2(+a)= Df al t— 1 )K@) ar, 
ORS a eel: Cee dite ren) (25) 


which is a system of integral equations of the Wiener- 
Hopf type, except that the upper limit of each integral 
is T instead of infinity. In addition, the Fourier transforms 
of the kernels are rational functions. From (7), we see 
that the minimum mean-square error is given by 


Min (é'(t)) = ((g[m.])”) 


-2> f al + oko at (26) 


Ill. Mreruop or SoLuTION 


|) is the sum of a number 
hence its Jourier 


It was assumed that w,;(| 7 
of decaying exponential functions; 
transform, defined as 


W,,032) = | Nik a at (27) 
is a rational function of \”. Suppose” 
DEP OX.) 
oS eS es 28 
Q") = ea (28) 


where D;; are constants, Q(A”) is of order d and P;;(X’) 
is of order n;,;(n;; < d). We shall derive a necessary and 
sufficient condition under which an absolutely integrable 
solution of (25) exists, and then state the conditions 
under which the sufficient condition can be satisfied. 
K,(t) 1s absolutely integrable if 
10 

[ | K.@|atr<« .N) (29) 

v0 
which is the usual stability condition for linear systems. 
A system is, for our purposes, defined to be stable if all 
bounded inputs result in bounded outputs. The approach 
here is first to transform (25) to a simpler system of 
integral equations which can easily be analyzed. It is 
then shown that the solution of the modified system of 
integral equations does, in fact, satisfy (25). The following 
notation will be used: 


&(| 7 |) = inverse Fourier transform of 1/Q(X’). 
[DP (d’ /dt’)] N X N square matrix whose elements 


are linear operators D;;P;;(—d’/dt’). 


6 The common denominator of all W;;(X?) is used. 
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K(t), yé + a), z(t + a) 
are time functions K;(t), y;(t + a), 
spectively, G = 1, 2, ---, N). 

Let y(t + a) be any solution satisfying the following, 

system of differential equations 


vectors whose components) 


z(t + a), Tee 


2 


z(t + a) = | or(- vy + a), O<i1<T, Ce 
and let the modified set of integral equations be given by 


t — + |)K(y dz, Orie 


vita) = fa 


It is shown in Appendix II that the solution of (81) also 
satisfies (25). In fact, it can be shown’ that all solutions! 
of (25) are necessarily the solutions of (81). This latter’ 
property, however, is immaterial since any solution (if 
it is not unique) of (25) will give the same mean-square! 


) 
i 


(31) 


error. 

We now investigate the solution of the modified system 
of integral equations (31). Appendix III shows that if 
| K(t) | < ~, (0 <t< T), then K() satisfies 5 


T\9(t + 2), O<i<T. (© 


K() = o(-4 


However, certain conditions on y(t + a) at ft O and 
= T are necessary in order that K(¢) so obtained from 
(82) do satisfy (31). Those conditions are obtained by 
substituting (82) into (31) and solving the resultant 
equations as an identity.” 
Let us first establish a useful result. From the definitige 


) 


of &(| ¢ |), 
1 © jXE F 
€ 
a(| t mene. (33) 
we obtain the corresponding differential equation 
(a) | 
Q\— aA i |) = 6%. (34), 
In particular, let | 
d 
SD SMe (35), 
k=0 
Then (34) becomes’ 
| 
d 
Do (= Dana? (( ¢ |) = 8) (36) 
c=0 


This relation will be used later on. | 
Using (35), (82) can be rewritten as 


Ki) = Y(-1'ex(4) yet+, O<t<T. BD 


7F, Tung, “A Class of Optimum Nonlinear Filters for Quantized 
Inputs,” Dept. of Elec. Engrg., Columbia University, New York, 
N. Y., Tech. Rept. T-2/N; July, 1960. 

8 W. B. Davenport, Jr., and W. L. Root, “An Introduction to 
the Theory of Random Signals and Noise,” McGraw-Hill Book Co., 
Inc., New York, N. Y., Appendix II; 1958. 

9 & @k)(|¢)) denotes dé (u)/du?*| w = |e}. 
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Aultiplying both sides of (37) by &(| t — + |) and inte- 


rating the resultant expression with respect to 7 from 
)to T, we obtain 


| a 


= [ ye (—1)"quy (7 + a)a(t — 7) dr 


t — + |)K(s) dr 


=P i dX (—1)"quy (7 + a)da(r — tb) dr, (38) 
OS FS 1, 


dere we have separated the range of integration into two 
egions; for 0 < 7 < ¢ the kernel is &(¢ — 7), while for 
<7 < T the kernel is d(7 — ¢). After integrating the 
‘ight-hand side of (38) by parts 2d times and making 
ise of the property that’® 


& (0+) = (-1)'8'"(0-), (39) 


ve are left with integrals 
Te 
(—1'ax [yr + ao t= + |) dr 
70 


unintegrated as they occur at every other step. In addition, 
terms involving the derivatives of y(¢ + a) at ¢ = 0 and 
| = T are carried over from each of the 2d integrations. 

It can readily be shown’ that the right-hand side of 
(39) can be expressed as 


2d 


2d 
Bro 5), = DOT a )Z, 


e=1 


+f Aptana t = + Dye + @) dr, 


where Y, and Z, are vectors whose components are Y,;, 
and Z;.(4 = 1,2,---,N). Y;, and Z;, are linear combina- 
tions of the derivatives of y;(¢ + a) att = Oandt = T 
given by 


d 
ee — (1) ay, ~@), @ = 1,2; -- =, N), (40) 
k=ke 
and 
d 
ae = ys (Cerone AL oh a), 
k=ke 
(e152, N), (41) 
where 
etl. , 
ae: 9 if e is odd 
e/2 if e is even 


10 This can be verified by taking the limits of the derivatives 
of &(t) of any order on both sides of ¢ = 0. 
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By (386), the summation in the last integral of the above 
expression is 6(¢ — 7). Hence (388) can be reduced to 


2d 


SS (ice) ye 


e=1 


i &(| ¢ — 7 |)K(r) dr = y(t +a) + 


2d 


+ Marr — 


e=1 


DZ, ORS ES (42) 
We observe from (42) that, if the solution obtained from 
the differential equation (32) is to satisfy the system of 
integral equations (31), means should be provided to take 
care of the two terms added to y(¢ + a) in (42). 

Let us add to the solution K(t) of (32) two terms of the 
form 


Dioerectce ae 


where b and c are vectors consisting of elements 
b;, ¢;G = 1, 2, --- , N). Then it is clear that (42) 1s 
identically satisfied if 
2d 
yD SG Se AOD (43) 
and 
2d 
O°" — OZ, = AT — de. (44) 


e=1 


Let the roots of Q(\”) = 0 be denoted by +V/—1 ai, 
and let A; be the residues of 1/Q(A”) at the poles 
X= V—la;@ = 1, 2, --- , d). We see from (34) that 
&(t) can be expressed as 


NOV =D ee in So 
i=1 
It follows that 
d 
OU (Dis = ay ae new re mC 
i=1 


Upon equating the corresponding coefficients of e “*‘(¢=1, 
2, --- , d) in (48) and (44), we obtain 2Nd algebraic 
equations. Therefore, the necessary and sufficient con- 
dition for an absolutely integrable solution of (25) to 
exist is that the 2Nd equations be identically satisfied. 

A sufficient condition needed to obtain a solution of 
these 2Nd equations is that they contain 2Nd unde- 
termined constants. Since 2N constants are contributed 
by b and c, 2Nd — 2N = 2N(d — 1) constants must 
appear in Y, and Z,. This requires, in view of (40) and 
(41), 2N(d — 1) constants in y(t + a). Hence, the de- 
terminant | DP(A*) | in (30) must be a polynomial in i 
of degree 2N(d — 1). If the coefficient matrix of the 2Nd 
equations is nonsingular, the solution is then unique. 

If some n;; = d and | DP()’) | is itself of order 2Nd 
in A, then it can be shown that b and c are zero vectors, 
and hence the solution is bounded. It is of interest to note 


11 This does not violate (29), since (fee \no(é) | tata —wile 
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that precisely these conditions arise if we consider the 
set of integral equations of the second kind 

\ iH ds 
pK) = of w(|¢— 7 KC) dr, 


1 


O< P< 7 VG Si -,N), (46a) 


ag Liigeegees 


where p is an eigenvalue. These equations may be written 


N r 
Of > [ &,;(| t—Tr |)K ;(7) ar, 


O5¢<T, @=1,2)5:-5N), “GSD 

where 
é,((f—7)= eee! f= 4 
@;;(| t aie. ») ix j. 


The theory developed above immediately shows that 
(45b), and hence (45a), always possesses a solution if the 
Fourier transform of w;;(| 2 |) is a rational function in \” 
expressed by (28) with n;; < d. The Fourier transform of 
w,;(| 2 |) is therefore also a rational function in \” in which 
at least the degrees n;; of the diagonal terms are equal to 
d. There are an infinite number of bounded solutions 
K(t). These solutions correspond to the set of eigenvalues 
p which make the coefficient matrix of the 2Nd equations 
singular. 

So far, we have only considered the finite memory 
filter. The result, however, can be applied directly to the 
infinite memory filter, in which 7 is infinite. For the 
infinite memory filter, the stability condition becomes 


| [RO se = Ge 
J0 
which implies that: 
1) The roots of 
PDP OS ale =-0 (46) 


cannot be purely real. 
2) The terms in K(t) which belong to the roots of (46) 
in the lower half of the \ plane should be discarded. 


In the infinite memory case, it is necessary to use only 
(43). The number of equations as well as the number of 
unknowns are reduced by a factor of two. 


IV. EXAMPLE 


For purposes of illustration, we shall consider here a 
simple example of pure prediction. Let the desired output 
be m(t + a). The amplitude of the input process at any 
time can take on any one of the four values +1 and 
+2 with equal probability. The second-order probability 
density function of the input process is assumed to be” 


12 §(7, 7) denotes 6(m1 — 1)d(™m: — 7). 
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Dram tiny Ma) = aby ile hoc eee) | 
+ 6(1, —1) + 6(—2, 2) + 6(—1, 1)] 
“eh [ —= O  Od) 
+ 6(1, —2) + 6(-—2, 1) + 6(—1, 2)] 
+ dell — Pl" 8(2, 1) 
+ 6(1, 2) + 6(—2, —1) + 6(—1, —2)] 
+ [1 ate t ies =e ane ey ae 
-[6(2, 2) + 6(1, 1) + 6(—1, —1) + O(-2, —2)].) 


Let 8, = 6 = 1, Bs = 2, and let e * be denoted by 
Using (8), (9), (20) and (23), we see that the optima 


| 


integral equations (25) are 


ds(5ke™" — k’e~*") 


T 


= 7 f [Be +67 "K(x dr 
(9) 
0s 
+ 7 [ero Seat) an 
4 (47) 
ds (7ke! + k’e~*') . 
i 
= vf fe! = 6 PK (2) dr 
0 


a I Eve ea diheke ani 


Oia 


The Fourier transforms of the respective kernels are’ 


: 3 10(A* + 2.8) 
Te = : = ; 
WW u(r) W22(d) Ne A= 5d” =e 4 j 
and ‘ 
Nees 2 Ped + 2) : | 


Eq. (30) can therefore be written as 
| 5ke* — ke?" 


L7ke' + kre" | 


Sm 2 2 = 
10( i i s) ee as 2) | 
= é (te | 
d d° | 
ae 5D) pee 
(fe+2)  r0(-4 +28) | 
0<1<T, 
from which we obtain the solution 
y(t) = ae’ + en + ase’ + ae” + 1/6ke™' 
Us) = — Gey! ~ ee + ase’ + (a, — k?/12)e7*! 
+ 1/3ke~‘ 
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here a, to dy are arbitrary constants. K(¢) and y(t) are 
plated by (32) which, in this case, becomes 


K(f) = Ee — 5 on + if, Qe o 
Jpon simplification, we find that 
BRO =ge" "ge" 
S a OS ver 
GO) = ae - ee 


vhere g, and g, are arbitrary constants. 

Now let us add b,6(t) + c,d(¢ — T) and 6,6(t) + 
26(t — T) to K,(t) and K,(t), respectively, and sub- 
titute the complete expression in (47). We obtain, upon 


etting 7 = 100 msec, 
Op = 121.375 X10 “Ue =k) 
go = 2.775 X 10 °(h’ — k) 
b, = (1291.9k — 291.9k?) x 107° 


bo = (1708.1k + 291.9k”) x 10° 


CG, = 37.276 X 10 °*(k’ — k) 
Cy == Gy. 
Che complete solution for K(¢) is 
eae 
ee Oe hh 01) es 
2 Se a _ que’ 
+b, 6) + e 6(¢ — 0.1) 
K(/) = tole 


It is clear that K,(t) # 2K,(t) (excluding the trivial case 
of k = 1). Consequently, the optimum predictor for this 
problem is always nonlinear. Using (26), we obtain the 
normalized mean-square error for this nonlinear filter, 


Min (e’)», 


ae 37.0164k> + 1.9814k° + 1.0022k’]. (48) 


10 | 
The normalized mean-square error of the corresponding 
linear filter is found to be 


Min Ce 
=l|- m7 (36.1046? + 3.7920k* + 0.1034k"]. 


[t is seen that the difference between (49) and (48) is 
ilways greater than or equal to zero; that is 


Vin Cae a= Min Ce 


[(0.9118k? — 1.8106k° + 0.8988k'] > 0. 


i 
40 
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(49) 
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Consequently, we have shown that, for this particular 
example, improved results can be obtained by using a 
nonlinear filter of class Jt, in place of a linear filter. The 
amount of improvement depends on the problem at hand. 
For the numerical values we have assumed, the improve- 
ment is negligible. For k = 0.5, the improvement is about 
0.19 per cent. Inasmuch as there are five parameters in 
this example (8,, 62, 63, k and T), it is unlikely that any 
general statement can be made with regard to their 
effects on the mean-square error without first obtaining 
an explicit expression in terms of these parameters. 

For purposes of comparison, the case of the infinite 
memory filter was also examined. For T = o, it is found 


that 
2 Lvs, 
K(f) = 121.3180 X 107*(k? — he 
4 (1292.898k — 292.898k") X 107° a(t) 
Kit = = 1013130610 G ae 
+ (1708.102k + 292.898k%) X 107° 4(2)’ 
2 o 


Table I shows the improvement of the class Yt, filter over 
the linear filter for two different values of k. 


TABLE I 
k if Improvement 
0.5 0.1 0.19 per cent 
0.5 co Plapermcent: 
0.9 0.1 0.1 per cent 
ORS ) 2.6 per cent 


APPENDIX [ 


We wish to show that the necessary and sufficient 
conditions for the joint probability distribution function 


AAC aga) SOHO haa) Sf) (50) 
to be an even function in 7, — 7. are: 


1) x(t) is a strictly stationary process of second order, 
and 

2) The probability distribution function (50) is sym- 
metrical with respect to the arguments p, and ps, 
namely 


Pr. [x(é¢ — 7) < pi; a(t — 72) < po] 


an he EA rah SOY rela (ee G5) SS ofa. (51) 

Proof 
Condition 1) implies that (50) is only a function of 
T — Tz. Let 8 = 7, — 72; then (50) can be written as 


Pr. [w(t) S pi; x + 8) S pol. (52) 
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Using (51), we see that (52) is equivalent to 

Pr. (a(t) S p25 at - B) S pil 
which, in view of condition 1), can also be written as 

Pr. [x(¢ — 8) S p2; 2) S pail, 
or equivalently 

Pr. [a(@d) S pi; et — B) S po]. 
Sufficiency of the conditions is therefore proved by 
comparing (53) with (52). To prove their necessity, we 
we only need to proceed in the reverse direction. If (50) 
is to be a function of only 7, — 72, condition 1) is necessary. 


Let 8 = 7, — 72; then the statement that (50) is an even 
function in 7; — 72 Implies that 


(53) 


PralgQex< pie eee) Spa 
Pr. [x(¢) < pi; z(t — 8) S pr]. 


Using condition 1), we see that the right-hand side of 
(54) can be written as 


(54) 


Prelate 2B) Pi; alt) < Po] 
= Preied) Spe e eB pa 
which shows that condition 2) or (51) is satisfied. 


APPENDIX II 


We wish to show that the solution of (81) also satisfies 
(25). To see this, we operate on both sides of (31) by 
[DP(—d’/dt’)]. Hence 


| pe(-%) [ a t — 1 |)K(r) dr 
= | o(—%) fre Le Oot Teens) 


The left-hand side of (55) is a column matrix whose 
elements are 


N d’ 1 a 
y PP —te) a 


Ts |) K (7) dr, 


(7 = dros NOs 
or 
N nil 
eS | w ; (| eas |).K (7) ar, (7 = is 2, pee S) N). (56) 
7=1 JO 


In view of (30), the right-hand side of (55) 1s z(t + a). 
z(t + a). 


Eq. (55) therefore reduces to 


> | 


T 


w,(| ¢ — 7 |)K,(7) dr = 2,(t + a), 


OPS cul © C=" 12h eee 


which is, in fact, (25). 
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We wish to show that if | K(r) | < ~ 0 <r < T), 
the solution of 


4 
yt+a)= f[ al/t—7)K@)dr, O<t<7T, (50) 
where 
x 1 = 1 pda a i} 
da) = 5 i oe (58) 
satisfies 


K(t) = o(-S)yt 4 @)l') 20 <n 


Since &(t) is of the form 


Avert ee 


i=1 


it is readily verified by taking the limits of the derivatives 
of any order on both sides of ¢ = 0 that 


(0+) = (-1'6@-). (60) 


Eq. (57) can be written 
T. 
ere / 6 Kae 
70 


+f a-dKdr, o<sr. GY 


1 


f 
Taking derivatives on both sides with respect to ¢ yields 
y(t +a) = [a0+) — 6@—-)]-K | 
{ 


ata [ &'?(t — 7)K(s) dr 


Te | 
= i & (4 — t)K(s) dr, OStsT. (2% 


In view of (60), the first summation vanishes and we are 
left with 


y(¢+a) = iF &'?(t — 1)K(s) dr 

2 [ 2G —)K(r)dr, O<t<T. (Gm 
Similarly, 
yG + a) = (670+) + 6% O-)]K® 


+ { @?(t — a)K(s) dr 


a / or = )K@ dr, 0 <a ree 


1/961 


he first term again vanishes and (64) is reduced to 


at 


PU +a) = | 


0 


oS t)K(r) dr 


oe +a) = ti OG — 7)K(z) dr 


‘or even k, we can write (66) as 


a 


ia sie a) ae i a pe |)K(r) dr 


Omen 
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‘he Probability Density of the Phase 
Difference of a Narrow-Band Gaus- 
ian Noise with Sinusoidal Signal* 


The first-order amplitude and phase’ 
robability densities for a stationary narrow- 
and Gaussian noise both with and without 
_ sinusoidal signal have been reported by 
everal people.?:* The second-order prob- 
bility density function of phase for noise 
lone was evaluated by MacDonald. We 
ave calculated the second-order probability 
unction for phase with sinusoidal signal by 
pproximation techniques and have numer- 
rally evaluated the probability density 
inction for a phase difference between 
wo instants of time for various parameter 
alues. Values used are those applicable to 

specific signal processing problem for 
‘hich the work was done.’ The numerical 
asults, however, are presented in such a 
orm that they may be of interest to workers 
1 many fields to which statistical commun- 
‘ation theory is applicable. 

It is noted that this calculation is also 
pplicable to the statistics of the outputs 


* Received by the PGIT, October 25, 1960; 
vised manuscript received, November 21, 1960. 
1‘‘Phase’’ throughout this note means “‘phase, 
odulo 27.’’ We do not concern ourselves with phase 
itside the interval 0 — 2r. 

2W. R. Bennett, ‘‘Methods of solving noise 
roblems,’’ Proc. IRE, vol. 44, pp. 609-638; May, 
56. 

3 W. B. Davenport, Jr., and W. L. Root, ‘‘An 
troduction to the Theory of Random Signals 


id Noise,’ McGraw-Hill Book Co., Inc., New 
ork, N. Y., sect. 8-6, pp. 165-167; 1958. _ 
4D. K. C. MacDonald, “Some statistical pro- 


srties of random noise,’’ Proc. Cambridge Phil. 
c., vol. 45, pp. 368-372; July, 1949. 

5H, R. Raemer and R. Blyth, ‘““Mathematical 
fodel of Sonar Target Returns in Reverberation 
id Water Noise,’’ Cook Res. Labs., Morton Grove, 
l., Phase I, Final Rept., Pt. II, Contract NObsr 
614, U. S. Navy Bur. of Ships; March 29, 1960. 


ve 
+ [ 6% — )KG) dr, 0<1<T. 


} ollowing the same reasoning, one sees that 


+ (=1' [a = OKG) ar, 0< 


of a two-channel differencing device, where 
the inputs to the channel are identical 
sinusoidal signals with different Gaussian 
noises, the latter not necessarily being 
statistically independent. 


REPRESENTATION OF THE 
SriGgNaL-PLus-NoIsE WAVEFORM 


Consider a waveform v(t) consisting of a 
sinusoidal signal A cos wot + B sin wot 
superposed on a narrow-band stationary 
Gaussian noise n(t) with zero mean, 
variance o*, bandwidth Aw, and center 
frequency 1. 

Many writers® have represented such a 
noise by 


n(t) = x,(¢) cosw,t + y,(é) sin w,¢ (1) 


where «,(t) and y,(t) are random time 
functions varying slowly compared to cos 
wt and sin wt. With this representation, 
the waveform v(t) is given by 


DOS 


where 


x(t) COS wot + y(t) SiN wot (2) 


x(t) = 2,(t) cos (@, — wo)t 

+ y,(4) sin (w, — w)t + A 
—x,(t) sin (a, — wo)t 

+ Yn(t) Cos (w: — w)t + B 
or alternatively by 


v(t) = p(t) cos (wot — o(/)) —B) 


6 #. g., Davenport and Root, op. cit., sect. 8.5, 
p. 158. 


y(t) = 
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From (58), we see that 
alk) (| , 1 iL (jr) jhe ; pens 
ore |\= dl G03 ° dn; k = even (68) 
(65) k< 2d. 
Consequently, if we operate on both sides of (57) by 
Q(—d’/dt’), the denominator Q(d”) vanishes and the 
resultant equation has the form 
( d’ : Soe 
——; t = / K d f ao NG) dy 
ee 5 G5) : et +4) ee ree 
=K), Os 757, 
where use has been made of the relation 
ee ee b(t — 7). 
¢ = even. (67) OT Soe 


where 
x(t) = p(t) cos (2) 


y(t) = p(t) sin (2). 


The probability density function of ¢ is 
illustrated in Fig. 1. 


SUMMARY OF CALCULATION 


The details of the calculation were pre- 
sented earlier.* The procedure followed was: 


1) The expression of the second-order 
statistics of v(t) in terms of the 
quadrivariate Gaussian probability 
density function po2(pi, $1, 2, $2; 7), 
where the subscripts 1 and 2 on p and 
@ refer to time instants f; and f2, re- 
spectively, and 7 is the separation 
time (t2 — ¢,). 

2) The approximate analytical integra- 
tion of po(p1, ¢1, p2, 62; T) Over pi 
and p2 to obtain the second-order 
phase probability density function 
P21; $2; 7). 

3) The numerical integration of p2(¢1, 
1 + Ag; rT) over ¢; to obtain p(A¢; 7), 
the probability density function for the 
phase difference A? = ¢2 — ¢1. 


RESULTS 


The results of the numerical computations 
of p(A¢, r) are contained in Tables I and II 
and in the curves of Figs. 2-5.7 Fig. 6 is a 
plot of the cumulative distribution function, 
1.¢., the probability that | A¢ | < @ plotted 


7 Figs. 2-5 are plots of probability density in 
reciprocal degrees vs A¢ in degrees, from Ad = 0° 
to Ad = 90°. In all cases p(Ag, 7) continues to 
decrease out to 180° and then begins to increase. 
It is symmetrical about 180°. 


IRE 
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TABLE I 
TABULATION OF PROBABILITY DENsITY p(Ad; 7) 
vi(7) = ENVELOPE oF NORMALIZED AUTOCORRELATION FUNCTION OF NOISE 
No CorrELATION BETWEEN INPHASE AND QUADRATURE NoIspe COMPONENTS 
Peak Signal Peak Signal 
—————— = —0.5 db —————— ~ -—10.4 db 
RMS Noise RMS Noise 
V7) | Ad = 0° Ad = 3°AG = 10° Ag = 202 Ag = 405 Ag = 902 AG) = 02 SAG ooNor— 00" 
0.999 4.43 3.25 0.667 0.116 0.163 0.0018 3.72 2.98 0.0028 
0.970 2.87 2.47 0.955 0.228 0.357 0.0041 2.40 2.17 0.006: 
0.900 1.54 1.45 0.0123 127 1.23 0.0203 
0.850 1.030 0.988 0.0230 0.842 0.832 0.0396 
0.770 0.805 0.778 0.0319 0.647 0.637 0.0568 
0.650 0.618 0.600 0.0424 0.487 0.479 0.0785 
0.460 0.459 0.446 0.0529 0.344 0.340 0.1033 
TABLE II 
TABULATION OF PROBABILITY DENSITY p(A¢, 7) 
yi(7) = EnveLorr or NoRMALIZED CoRRELATION FuNcTION or NoIsE 
No CorrELaATION BETWEEN INPHASE AND QUADRATURE Noise COMPONENTS 
Peak Signal Peak Signal 
| SS =~ —10.4 db =) HCl 
RMS Noise RMS Noise 
¥7(7) | Ad =0° Ad =3° Ad = 10° Ad = 20° Ag = 40° Ag = 90° Ag = 0° Ag = 3° Ad = 10° Ad = 20° Ad = 40° Ag = 90° 
0.980 2.61 Dez, Fla 33 0.281 0.476 0.0055 3.11 2.63 0.912 0.203 0.318 0.0035 
0.925 1.29 1.24 0.506 0.020 1.56 1.46 0.0121 
0.640 | 0.470 0.466 0.081 0.598 0.585 0.0432 
0.210 0.230 0.227 0.121 0.330 0.320 0.0590 
0.0775 0.189 0.189 0.123 0.286 0.276 0.0592 
0.0655 0.187 0.185 0.123 0.283 0.272 0.0592 
0.0464 0.182 0.180 0.123 0.276 0.269 0.0592 


1961 : 


against 8. To render the numerical results 
applicable to situations other than the 
specialized problem whose parameters were 


used in the computer, they are presented 


for certain positive values of w,(7), the 
envelope of the normalized autocorrelation 


function of the noise waveform,’ without 


reference to the physical parameter values 
on which basis these numbers were chosen. 

Peak signal-to-rms noise ratios for which 
numerical work was done are ~ —0.5 db 


(signal slightly below noise, Figs. 2 and 3) 


and ~ —10.4 db (signal far below noise, 
Figs. 4 and 5). 

Consider first the case where signal is 
slightly below noise. It is apparent from 
Figs. 2 and 3 that the probability density is 
sharply peaked at A¢ = 0 if the noise wave- 
forms at ¢; and ¢; + 7 are highly correlated 
(i.e. Wr(r) 2 0.8), and nearly flat when 
they are only slightly correlated (i.e., 
vir) J 0.3). The same applies qualita- 
tively when the signal is far below noise, 
but as we would expect, the difference 


between high and low correlations is some- 


what less pronounced, the waveform being 
almost pure noise and its phase being, 
therefore, more nearly random. The curves 


of Figs. 4 and 5 show that the density 


function flattens out even for relatively 
high correlations, 7.e., when y;(r) = 0.64. 
Fig. 6 shows the probability that A¢ is 
within 10° of zero to be between 0.05 and 
0.15 in cases where signal is well below 
noise and somewhat higher (between 0.25 
and 0.35) when the signal is comparable 
to noise. 

Note that the functional relationship 
with SNR of the probability density func- 
tion of Ag is similar to that of the first-order 
phase probability density p(¢) shown in 
Fig. i. The latter function, however, uses 
the phase of the pure signal as the zero 
reference for the phase ¢. The signal phase 
is subtracted out in A@ and, therefore, the 
statistics of Ad are independent of it. Thus, 
the use of A¢ in signal processing schemes 
would be advantageous over the use of ¢, 
since the observer usually has no basis for 
the choice of a zero reference level for phase. 

Several months after completion of the 
reported analysis, the authors learned of 
some Russian work by Tsvetnov? and 
Aleksandrov!® that duplicates some of the 
results here. However, the calculation pro- 
cedure was different. Both Tsvetnov and 
Aleksandrovy calculated the probability 
density p(A¢) directly by integrating out 
first ¢; and then p; and p2; and the authors 
obtained an approximate result for p2(¢u, ¢2), 
expressed it in terms of ¢: and A¢ and 
numerically integrated out ¢:. Also, the 
general calculation of po(¢1, ¢2) by the 
authors did not require the absence of 
correlation between inphase and quadrature 
components of the noise, whereas this 
specialization was made at the outset in the 
Russian work, and apparently it somewhat 
simplified the integration procedure. 


8 T.e., ¥7(r) cos w1 = [n()n(t + 7))/[n2)]. 

9V. V. Tsvetnov, ‘‘Statistical properties of 
signals and noise in two channel phase systems,” 
Radiotekh. i Elektron., vol. 12, no. 5, pp. 12-30; 
1957. (In Russian.) 

10 M.S. Aleksandrov, ‘“‘Distribution of changes in 
phase difference for fluctuating random signals and 
correlated random noise,’ Radio Engrg. and LHlec- 
tronics (Engl. transl. of Radiotekh. i Elektron.), 
vol. 5, pp. 360-365; 1960. 
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Prediction for Wide-Sense Markov 
Processes* 


The minimum-mean-square linear esti- 
mator of y(t) = {°.W(t, r)a(7)dr based 
on the present and past of the random 
process x(t) is the solution of a generalized 
Wiener-Hopf equation. If W(t, 7) represents 
a nonrealizable time-varying weighting func- 
tion and x(t) is nonstationary, solving the 
Wiener-Hopf equation is difficult.! It is 
even more challenging if the estimate of 
y(t) is predicated on a fragmentary portion 
of the past of x(t), as when only sampled 
(not necessarily instantaneous or periodic) 
values of x(t) are available. 

As we shall show, the optimum estimator 
of y(t) is readily determined if a(t) is a 
Markov process in the wide sense. Indeed, 
for such processes we shall exhibit explicit 
solutions valid for x(t) and W(t, 7) of the 
type discussed in the preceding paragraph. 
To describe a wide-sense Markov process,? 
we first define #, the wide-sense conditional 
expectation. Let the random variable z and 
the random process x(t) be of finite mean 
square. Then, for any arbitrary finite set 
ol Us; 


Elz | Dt)s x(te), a grees) ba es 90 


= E y(t) (1) 


with the a, so chosen that the expectation 
E{| 2 — "—Ta,e(t,) |2] is minimized. 


* Received by the PGIT, November 21, 1960; 
revised manuscript received, January, 1961. The 
research reported here was supported by NASA 
Research Grant NsG-2-59, and The University 
of Michigan Inst. of Science and Technology. 

1 See J. Laning and R. Battin, ‘‘Random Processes 
in Automatic Control,’’ McGraw-Hill Book Co., 
Inc., New York, N. Y., sect. 8.5, pp. 329 ff.; 1956. 

2 This concept and the resulting notation are 
due to J. Doob, “Stochastic Processes,’ John 
Wiley and Sons, Inc., New York, N. Y., pp. 77 
and 90; 1953. 
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A special case is obtained by taking 
ih, Ky XS <i, mvl eS ay), Wien 


the process x(t) is defined to be wide-sense 
Markov if any such set of ¢’s yields a, = 0 
fork = 1,2, ---,n — 2; this is equivalent to 


Ela(t,) | x(t), r(ts), ms) x(t,-1)] 
= Blx(t,) | eh] 2) 


for allt; < tz < --: < t,. Moreover, it may 
be verified that if z(¢) has a continuous 
correlation, and A is a subset of the real 
line such that sup;.4 7 = 7* <1, 


E{x(t) | x(7), 72 A] 
= Efz(d | x(7*)] 


is implied by (2), and conversely. 

We remark that £ is a projection operator 
which has the special property (3) if x(£) is 
wide-sense Markov. The value of F is 
easily computed when 7* < ¢. In fact, 
a(r*)/{E[| x(r*) |?]}1? is a one-member 
orthonormal family, so that H[x(t) | x(7*)] 
is the one-term orthonormal expansion 


Elx(t) | 2(7*)] = RG, r*)a(r*) — ) 


where R(s, ¢) is defined (somewhat un- 
conventionally ) as 


_ Elx(s)e] 
Be cee IE Hi 


(3) 


(5) 


Here the line over a quantity indicates its 
complex conjugate. 

While the wide-sense Markov property 
is usually not directly verifiable, it is known 
that z(t) has this property if and only if? 


Ris,u) = Rs PRG). Vs aes 


(6) 

If, in addition, x(t) is wide-sense stationary, 

(6) requires that E[z(s + t)x(s)] = ke~el4l, 

where k > 0 and c has a non-negative real 

part. Thus, wide-sense Markov processes 
are easily identified in practice. 

If z(t) is available to us for te A, the 


minimum-mean-square estimate of the 
previously defined y(t) is given by? 


GD) = Ely) | 2(7), 72 Al. 7) 


The set A may be interpreted as needed 
for various practical applications. For 
example, the realizability restriction for 
the filter operating on x(t) corresponds to 


A = (—~o, #]. The additional constraint 
of finite memory filtering changes A to 
[t — T, t], where T is the filter memory. 


A third alternative of importance is that 
of an x(t) sampled before becoming available 
to a realizable filter; then A is the subset 
of (— ~, ¢] which contains only the sampling 
points® (or sampling intervals in the case 
of nonzero width samples). 

We now find the optimum lnear filter, 
when realizability is the only constraint. 
Substituting for g(t) in (7) and using the 


3 Tbid., p. 233. Here, as in (5), we take R(s, t) 
if H[| x(t) |2] = 0. 

4 [bid., pp. 76-77. 

5 The sampled signal is properly viewed as a 
discrete sequence {x(tr)}, where the ft, are the 
sampling times. 
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distributivity of 2 yields 


y(t) = 


= al | 


Now, the operator & and the integration 
commute under conditions met by most 
W(t, x).° In the first term of (8), we then 
have H[x(7r) | x(u), uw < #] = x(7), sincer <i. 
In the second, we find that B{a(r) | x(x), 
u < é) = Bla(r) | x(t)]) = Rs, t)x(t), in 
accordance with (3) and (4). The result 
of our computation is therefore 


OLE / Wenn aie 


aS) / WG DRG Oo} 


so that the realizable filter 


G(t, 7) becomes 


optimum 
Git, 7) = WG, 7)U(t — 7) 


mite Lf Wit, v)R, t) aw] (tf — ie 
(10) 


Here, U(-) and 6(-) are the usual unit step 
and delta functions, respectively. 

Next, we solve the same problem as 
above, except that a finite memory filter 
is assumed. To accomplish this, we require 
the following assertion: if x(t) is wide-sense 
Markov, and B is a subset of the real line 
such that inf,.3 7 = r+ > t, 


E[x(t) | x(r), re Bl 


= Fiz(t) | x(z,)]. (11) 


The proof of (11) is as follows. In the first 
place, if x(t) is wide-sense Markov, so is 
a(—t); this is a consequence of (6). Applying 
(3); “to, e(—t)> gives! Hiz( —%) | 27), 
—re — B] = E{x(—t) | x((—7)*)] whenever 
(—7r)* = sup,;.g(—7) < —t. Changing back 
to positive arguments yields the desired 
result, with rs = inf,,3 7 = —sup;.p(—7) = 
(7S 

We now proceed as in the infinite memory 
case, except that (8) is replaced by 


foe} 
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By | Wt, 7)a(r) dr | au), u < ( 


W(t, r)x(r) dr | x(u),u < (| (8) 


The second and third terms of (12) are 
analogous to the two terms of (8), and are 
treated accordingly. The first term is 
simplified through use of (11). There results 


FOL = i _ WU, ala) dr 


+ a(t — T) i W(t, 7) 
‘R(r,t —T)d7+ 2() 


f W(t, 2)R(r, 2) dr, (13) 


the corresponding weighting function being 


Gt, 7) = Wt, NLU — 7) 
SG 


+[ fo 


-6¢-—T — 7) 


Wt, RW, t — T) a| 


_ Le Wt, v)RQ, 2) a] 


t 


-6(t — 7). (14) 


The technique just presented is also 
useful in predicting y(t) from a_ single 
sample. For instance, one instantaneous 
sample occurring at time ft) makes g(t) = 
x(to)J2oW(t, r)R(r, to)dr. A one-shot 
sample lasting from ¢, to ¢. will give a 
y(t) similar to (13); we need only replace 
t — T by ¢, throughout, and ¢ by ¢2 in all 
limits of integration and the z(t) and 
R(7, t) appearing in the third term of (13). 


qa) = al he Wt, r)a(7) dr | au),t —-T<u< ‘| 


=F al | 7 Wet, 7)x(7) dr LO) if IP eS ‘ 


+ al ic WE, tals) deus t — Lf eu | (12) 


(but by no means necessary) 
condition for commutativity is as follows: the 
correlation function of z(t) is continuous [as is 
already assumed for (3)], and W(t, 7) is absolutely 
integrable in 7 for every t. 


6 A sufficient 


The optimum weighting function is now 


Git, 7) = Wt, 7) UG — 7) 


- Kis W(t, v)RQ, ty) a] 5(t; — 7) 
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a [ / ‘ W(t, 0) RQ, tr) a] b(t, — 7). 
(15) 


Note that the first term contains U(é — 7) 
rather than U(t2 — 7) — U(t — 7), as 
might have been expected. This is purely 
a matter of convenience, since the filter 
receives no input before ¢, or after t2, so 
that g(t) remains unchanged if G(t, r) is 
arbitrarily specified for 7 outside this 
interval. 

Finally, we turn our attention to filtering 
a train of samples of x(t). The methods 
which we have discussed here fail unless 
W(t, r) = 0 for r < t.7 In spite of this 
restriction, it remains possible to deal with 
recovery of x(t), prediction and/or differ- 
entiation—in fact, any totally nonrealizable 
operation on x(t). Denoting the intersection 
of the sampling times and (— @, #] by 4A, 
the assumption on W(t, +) puts (7) in the 
form 


i) = al Wit, 2) 


-x(r) dr | au), we |. (16) 


Now wu <7 forallw eA, so that E[ax(r) | x(w), 
u «Al = R(r, 7*) from (3). Hence, we ob- 
tain 


g(t) = x(r*) 


f WE, Rr, c%) dz a 


where +* is the time of the trailing edge 
of the last sampling pulse occurring before 
time ¢ (unless ¢ is part of a sampling pulse, 
in which case 7* = ¢). It is clear from this 
result that only the trailing edge of the 
last sample is of importance; no useful 
contribution is obtained from the other 
(previous) pulses, regardless of their fre- 
quency or duration. 

We shall apply the results of the preceding 
analysis to an example of prediction. First, 
however, let us establish the general solution 
for the optimum predictor. Prediction means 
that y(t) = x(t + a), where a > O is the 
prediction interval. This y(¢) corresponds 
to W(t, 7) = 6(t + a — 7). If the entire 
past of x(t) can be used for prediction, 
substitution in (9) gives 


Htit+ta)=RG+a,ia). (18) 


This states that an amplifier with time- 
varying gain R(¢ + a, t) is applied to x(t) 
to give the best estimate, ¢(f + a). It is 
clear that the same amplifier is also optimum 
with respect to any finite-memory class of 
linear filters. 

When only sampled values of x(t) are 
available, (17) is used, and 


fi + a) = Rt + a, 7*)2(7*) (19) 


where 7* is the time of the trailing edge 
of the last sampling pulse occurring before 


7 Actually, it would suffice if G(t, r) is zero 
forts <7 <7*. The computations are then simplified 
by use of both (3) and (11). Difficulties arise only 
when A and its complement are ‘‘interleaved.”’ 
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time é (unless ¢ is part of the sampling 
pulse, in which case 7* = 1). 

More specificially, consider the random 
process x(t) defined for ¢ > 0 by the 
differential equation 


dz/dt + fix = h(dbz(t) 
£010 (20) 


in which 2(¢) is a real random process with 
zero mean and covariance E[z(s)z(t)] = 
&(s — t), f(t) is any continuous function, 
and A(t) is any measurable function. 

Evidently, x(t) is the output of a time- 
varying system whose input consists of 
white noise modulated by h(t). For instance, 
h(t) becomes a pulse train if the system 
receives its input through a sampling 
switch. 

We establish that z(t) is wide-sense 
Markov by verifying (6). For convenience, 
we define g(u, v) = exp [—f f(r) dr] and 
note that g(u, w) = g(u, v)gv, w). A 
straightforward computation then yields 


E{x(s)x(t)] = gO, s)g(0, 2) 


min (s,t) 
[ g(r, O)h%2) de = 90, 1 
0 


[alr alr, ORC) dx 2 


so that fors < 4, 


Pate, s)a(+, n(x) ar 
Rob, i) == . 


t 


g(r, t)g(r, O)h*(7) dr 
(22) 


If the denominator of (22) is zero, we have 
the convention R(s, t) = 0. A direct sub- 
stitution of (22) into (6) will now exhibit 
a(t) as a wide-sense Markov process. 

The optimum prediction of z(é + a) is 
obtained by substituting into (18). After 
simplifying the resulting expression, we see 
that the predictor is an amplifier with gain’ 


: f(x) ar | 
(23) 


which is calculated from (21). The result 
(23) holds only if [ | h(r) | dr > 0, for 
otherwise [/ g(r, 0)h*%(7) dr = 0 implying 
R(t + a, t) = 0; the latter gives €(¢ + a) = 
0, as indeed it should. 

If the prediction of z(t + @) is based 
only on samples of x(t), (19) is applicable. 
Now &(t + a) = 2(r*) exp [—Jie% f(r) dr] 
if [7" | A(x) | dr > 0, and &(t + a) = 0 
otherwise. 


0) 


wit 


RG + a, 1) = exp [ef 


t 


FREDERICK J. BEUTLER 
College of Engrg. 

The University of Michigan 
Ann Arbor, Mich. 


8 This generalizes the simplest nontrivial example 
of Wiener prediction involving a wide-sense_ sta- 
tionary process with correlation e—¢!¢!, c > 0. Its 
solution is an amplifier with gain e—c«. The same 
result is obtained in our problem by specializing 
to f(t) = c, h(t) = 1 and starting the process at 


t = —oo, 
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Optimum Prefiltering of Sampled 
Data* 


Let f denote a stochastic signal and n 
denote additive noise; consider a prefilter 
which prepares f + n for sampling. The 
sampled data is interpolated to recover an 
estimate of f. Optimum interpolation is dis- 
cussed by Stewart,! and an aspect of opti- 
mum prefiltering is discussed by Spilker, Jr.” 

Fig. 1 is a block diagram of the system. 


(Sampling) 


ttn w h 
Y 


Pigul: 


If the sample times are 47, then h(t) = 
DPR ew WkT) K(t — kT) which is to be an 
estimate of f. For any prefilter frequency 
response Y, the optimum Z(w), where Z is 
the Fourier transform of K, is found to be 


Y@)G,(@) 


ZO) 


where G denotes power density spectrum. 
The optimization is in the least-square 
sense and numerous familiar assumptions 
are made, stationarity, etc. The correspond- 
ing mean-square error is 


a 1 ‘ 
Sas | Gite) 


5 [ob +5) +l 


Qrk 
ake 


G(w) | Y@) | 


)} ¥6 


& [ole +) 


k=—@ 


The problem of interest here is to find 
the Y which minimizes this error. Suppose 
G, = 0, then if no prefiltering is used 
(¥(@) = 1), there is no error at the sample 
times; however, it is proved below that 
this exactness at the sample times should 
always be sacrificed if minimum average- 
square error is to be obtained. Specifically, 
the optimum prefilter has a band-pass 
of (7)~! eps for any G; and G,. 

We wish to maximize the integral of the 
second term in the expression for &. Since 
its denominator is periodic with period 
27/T, the quantity to be maximized can 
be written thusly: 


+ 


al 


aN] of 


por Eilat Be) | ve + 
De =f a ( | ey | i 2) z 
> [alot A) + alo + Ws. hee | 


* Received by the PGIT, December 5, 
revised manuscript received, February 6, 1961 

1R. M. Stewart, “Statistical design and evalua- 
tion of filters for the restoration of sampled data,” 
Proc. IRE, vol. 44, pp. 253-257; February, 1956. 

2J. J. Spilker, Jr., ‘‘Theoretical bounds on the 
performance of sampled data communications 
systems,” IRE Trans. on Circurt THeEory, vol. 
CT-7, pp. 335-341; September, 1960. In this paper, 
finite (positive) width samples are considered, and 
only an approximate analysis is made of optimum 
prefiltering. 
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We can consider any specific | # | < 2/7 
and now pick the values of | Y(w@ + 27k/7) |? 
for all k to maximize the ratio 
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2) 
gp 


k=—o@ 2) 0 = 
cate a | a @.{. 


nie 


6, 
Gf 


where 6; has been defined as the kth term 
of the sum in the denominator. For the 
moment, let }>¢-_.6% be a fixed positive 
number. To find the | Y(w + 2k/T) |? is 
enough to find the 6;’s. The numerator is 
the dot product of a vector in the first 
hyper-quadrant wth the vector 6, each 
component of @ is non-negative, and for 
the moment we fix }\?-_.6,. In two- 
dimensional space, the problem is to pick 
@ having its tip on the line indicated in 
Fig. 2 so as to maximize its projection along 


85 4 / 


Typical projections 


@, + 8= Constant 


9 


Fig. 2. 


a fixed vector (which has been denoted by 
A). Clearly,’ the best 6 is obtained by 
putting all of )°?_-_.0, on the axis for 
which A has its largest component; when 
this is done, the ratio becomes 


| A || 6| cosy 
Dd 4 


but with the optimum 6, | 6| = >04, and 
hence the ratio we maximized has the value 
| A | cos y. This is independent of the value 
we used for >°6,. 

Hence the optimum values for | Y(w + 
2rk/T) | consist of all zeros except for that 
k for which 


fs 


| 
E Onk 
ow + 7) 


is a maximum. If the maximum is achieved 
for several terms of the sequence, the values 
on these terms can be selected arbitrarily 
(not all zero) with all other terms zero. 
At the & which provides the maximum, the 
nonzero value of | Y(w + 2k/T) | is not 
important? (for mathmatical reasons the 


a Gale a 2h) 


’ The fact that this observation generalizes is 
well known in the theory of linear programming. 

4 If the sampler is noisy, an additional noise is 
added at the output of the prefilter, and hence Y 
should be made large if the effect of this additional 
noise is to be made small. 


values used should be such that | Y | is a 
reasonably well-behaved function). Suppose 
the value 1 is used for | Y |. The optimum 
Y(w), of course, now consists of a function 
which is one on a set of measure 27/7 and 
zero elsewhere. The set where Y is one is 
determined by considering the w-axis de- 
composed into a set of equivalence classes 
where w, £ we if and only if w. — wo, = 
2xk/T for some integer k, 7.e., provided 
w, and w», are aliases. Then Y is zero on 
all but one point of each class, and that 
point is determined by the point (or a point) 
where G,?/G; + G, has its maximum. 

As an important special case, if G,;?/G, + 
G; (or simply Gy; in the noise free case) is 
even and nonincreasing for positive fre- 
quencies, then the best Y(w) is 


[ution tos <aah 
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Comments On a Paper By Wax* 


In a recent article} Wax determined 
upper bounds for the maximum number 
of code symbols in a Hamming? code of 
length n and distance D. The purpose of 
this note is to point out a simple argument 
which will result in an improvement on 
Wax’s bound when D is small compared 
with n. For D > n/2, the argument leads 
to the same bound obtained by Wax. 

Wax’s argument, which depends on the 
density method of Blichfeldt,? is best ex- 


* Received by the PGIT, December 16, 1960. 

1 N. Wax, ‘‘On upper bounds for error detecting 
and error correcting codes of finite length,’ IRE 
TRANS. ON INFORMATION THEORY, vol. IT-5, pp. 
168-174; December, 1959. 

2R. W. Hamming, “Error detecting and error 
correcting codes,’ Bell Sys. Tech. J., vol. 29, pp. 
147-160; April, 1950. 

3H. F. Blichfeldt, ‘‘The minimum value of 
quadratic forms and the closest packing of spheres,”’ 
Math. Ann., vol. 101, pp. 605-608; 1929. 
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pressed in physical terms: if it can be shown 
that the density of a body is everywhere 
< 1, then it follows that the mass of the 
body is always less than or equal to its 
volume. Following Wax, we imagine that — 
at each of the code points we have centered 
a sphere of radius & = (D/2)!/2 where the 
Euclidean distance d corresponding to the 
Hamming distance D is given by D!/2 = d. 
These spheres which are thought of as 
superposable are truncated along the faces 
of the unit cube and assigned a density as 
indicated by Wax. 

Letting N(m, D) denote the maximum 
number of code points in a space of dimen- 
sion » whose distance apart is at least D, 
and letting W(n, D) denote the density 
function, Wax shows that 


N(n, D)M(n, D) <1 


by use of the argument mentioned earlier. 
The argument consists of showing that 
even though these new concentric spheres 
overlap, the choice of the density function 
M(n, D) is such that the density at any 
point is less than or equal to unity. This, 
together with the observation that the 
truncated spheres lie completely within the 
unit cube, lead to the bound. 

An improvement in the bound can be 
obtained by a better estimate of the total 
volume W of the overlapping spheres 
centered at the code points. We imagine 
the cube to be centered at the origin so 
that the code points will always come from 
the set (+1/2 --: +1/2). Suppose for a 
moment that we center a truncated sphere 
of radius (D/2)!/? at each vertex of the cube 
as though every vertex were a code point. 
There will, of course, be considerable over- 
lapping, but the total volume Y of these 2” 
truncated spheres satisfies W < Y < 1 
so that Y bounds W. To calculate Y, the 
unit cube is subdivided into 2” cubes of 
edge 1/2 denoted by C;1/2 7 = 1, 2 +++ 2” 
Associated with these smaller cubes are 3” 
vertices, namely the points (41/2 


+1/2). With each C;!/2 there is associated 
0 


one and only one vertex X; from the set 
(41/2 +1/2). Any point y which 
belongs to the unit cube must be in the 
interior or on the boundary of one of the 
C;/2. It follows that y will be contained in 
some sphere if and only if | y — Xi | < 
(D/2)!, In the event that one of the 
coordinates of y is zero, this coordinate is 
arbitrarily taken to be positive. It follows 
that the unit cube is covered by the 2” 
spheres of radius (D/2)!/2 centered at the 
vertices in the same way as the cube C;!/2 is 
covered by a single sphere centered at X7. 
It is shown by Wax that the volume of the 
truncated sphere of radius R centered at 
the vertex of a cube of side 6 is given by 
b"V(n, R/b). Letting b = 1/2, we obtain 


N(n, D)M(n, D) < Vin, (2 DY”). 


This bound will grow better as D becomes 
small compared with n. For D > n/2, 
Vin, (2D)') = 1, so that in this case 
there is no improvement. 


Louris D. Grey 
The Teleregister Corp. 
Stamford, Conn. 
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On the Postdetection Correlation Be- 
tween Two Sinusoidal Signals with 
Superimposed Correlated Noise* 


INTRODUCTION 


The problem of measuring autocorrelation 
and cross-correlation is frequently en- 
countered in experimental analysis. Cor- 
relation measurements are involved in the 
majority of experimental procedures in 
connection with analysis of transmission 
mechanisms, as, for instance, ionospheric 
transmission and scattering of radio waves. 
Another example is the method of finding 
the transfer function of linear four-poles by 
measuring the correlation between input 
and output signals. The results of the 
present paper have also been successfully 
applied to the analysis of signals from 
earth satellites. 

Since most of the signals used in transmis- 
sion techniques are high-frequency signals 
with relatively narrow bandwidths, the 
correlation measurements are most easily 
performed on the detected signals, whereas 
for theoretical calculations the correlation 
between the undetected signals is required. 
It is therefore of great importance to know 
the relationship between the correlation 
before and after detection. 

It is the purpose of this paper to establish 
this relationship for a special class of signals, 
namely signals consisting of a sinusoidal 
component with superimposed narrow-band 
Gaussian noise. This class of signals is 
representative for many situations appear- 
ing in technical applications. 

The mathematical methods for solving 
such problems are given by Rice! and 
Middleton.? 

We consider two signals, /;(t) and /.(t), 
both consisting of a sinusoidal component 
with narrow-band Gaussian noise super- 
imposed: 


I,(t) = S,(2) a= N,(t) 
I,(t) = S,(t) + N2(2) 


(1) 


where 


Sjee= P, cos Qrfoli— ¢;) 
S(t) = Pz cos (2rfot — 2). 


The noise signals Ni(t) and N2(¢) are 
supposed to be correlated, both having a 
narrow power spectrum with center fre- 
quency near fo. The power spectra are 
wif) and w.(f), respectively. The SNR 
of the signal J; is given by 


(2) 


where 
i AGE (4) 


* Received by the PGIT, December 19, 1960; 
revised manuscript received, March 14, 1961. 

18. O. Rice, ‘‘Mathematical analysis of random 
noise,” Bell Sys. Tech. J., vol. 23, pp. 282-332, 
July, 1944; vol. 24, pp. 46-156; January, 1945. 

2D. Middleton, ‘‘An Introduction to Statistical 
Communication Theory,’”’ McGraw-Hill Book Co., 
Inc., New York, N. Y.; 1960. 
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Similar expressions apply to the second 
signal. The analysis includes both cross- 
correlation and autocorrelation. In the 
latter case, [2 is defined by 


I,(t) = I(t + 7). (5) 


We introduce two quantities py and J, 
defined by 


comes 1 
us oa Gare 
[ Voom ap) 6) 


v0 


where 6(f) is the phase difference between 
the components of the two noise signals 
at frequency f, and the pointed brackets 
() denote the statistical average. With 
these notations, the correlation coefficient 
py of the two noise signals is given by 


py = py COSD. (7) 


We shall consider a linear envelope- 
detector consisting of a linear detector 
followed by a low-pass filter. The output 
is then proportional to the envelope of the 
input signal. We shall also consider a 
square-law detector followed by a low-pass 
filter. The output of this device is pro- 
portional to the square of the envolope 
of the input signal. Our task is therefore 
to calculate the correlation coefficient pp 
for the two envelopes of the signals, and 
the correlation coefficient pr: for the 
squares of the envelopes. 


RESULTS OF THE ANALYSIS 


Explicit expressions for pr and pr2 may 
be obtained by the characteristic function 
method if the properties of the detectors 
are expressed by their contour integral 
representation. This method has been de- 
seribed in detail.1? 
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Eq (8) contains four parameters: The 
correlation coefficient ppz of the detected 
signals; the ‘in-phase’ correlation coeffi- 
cient py; of the undetected signals; the 
geometric mean of the SNR’s Vajas; and 
the phase factor yg. In order to determine 
the correlation between the undetected 
signals from the measured values of pp:, 
one has to know certain properties of the 
signals, namely the SNR’s and the phase 
angle ¢. 

In Fig. 1, the correlation coefficient pp: 
of the detected signals is plotted vs the 
correlation coefficient py with Vaja. and ¢ 
as parameters. Fig. 1 shows the general 
behavior of the function. In Fig. 2 is 
shown a more detailed plot for the special 
case that the phase angle ¢ is equal to zero. 
This is probably the case most often met 
with in practical applications. In the case 
of autocorrelation, it is easily shown that ¢ 
equals zero for symmetrical noise spectrum. 

For the square-law detector, the problem 
leads to integrals which can be evaluated, 
giving a simple expression in closed form 
for the correlation between the detected 
signals. This is not the case for the linear 
detector, for which series expansion must 
be used. 

The exact expression for the general case 
is very complicated and requires electronic 
computers for its evaluation. In Fig. 3 is 
shown a plot of pr as a function of py for 
the special case that ¢ = 0 and a1 = a2 =a. 

Comparison with Fig. 2 shows that there 
is little difference between the correlation 
of the outputs from linear detectors and 
from square-law detectors. 

The series expansion for pr may be 
approximated by simpler expressions valid 
for certain ranges of the parameters. We 
shall state here only the results for the case 
of equal SNR’s ai = a2 = a, and g = 0. 
The following three approximate expressions 
are then obtained: 
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We obtain for the square-law detector: 
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which is valid for small values of a, a < 1; 


which is valid for large values of a, a > 1; 
and 
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Fig. 1—Correlation coefficient pr2 of the output 
from a square-law detector plotted vs the cor- 
relation coefficient ps of the undetected signals. 
The parameters are: the phase angle » and the 
geometric mean of the SNR’s ai and a2. 


which is valid for small values of p;, py < 1. 
Here, H(p;) and K(p;) are the elliptic 
integrals, and Jo(a@/2) and I;(a/2) are the 
modified Bessel functions. Eq. (12) is valid 
with good accuracy even for relatively 
large values of p;. For any value of the 
parameters, the error is less than 9 per cent. 
Comparison of (8), (10) and (11) shows 
that for zero SNR, pp is less than pr? by 
a factor which turns out to be between 
0.91 and 1, and for large SNR, p,z is larger 
than pp? by a factor approaching unity. 


INCOHERENT SIGNALS 


Up to this point, we have assumed that 
the two signals have the same frequency; 
1.e., the two signals are coherent. Two 
signals with different frequencies may be 
expressed mathematically as having equal 
frequencies and a phase difference varying 
linearly with time. Thus the angle ¢1 — ¢2, 
and hence ¢ is uniformly distributed in the 
interval from zero to 27. The results de- 
rived for coherent signals may therefore be 
applied to incoherent signals by averaging 
with respect to g. ~ 

Carrying out the averaging process on 
the results for coherent signals, we obtain: 

Square-law detector: 


2 


Pr 
2 => —* 13 
ames COP EU ae a 
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Fig. 2—Correlation coefficient pr? of the output 
from a square-law detector plotted vs the cor- 
relation coefficient ps of the undetected signals, 
for the special case ¢ = 0. The parameter V aia2 
has the values, starting from the lower curve: 0, 
0.1, 0.25, 0.5, 1, 2, 4, and oo. 


Linear detector, small values of a: 


Ks Salen enced 
Pr (4 — m)(1 +a) 


Linear detector, large values of a: 


a ee ee 
PrR~ Sa aes (15) 


Fig. 3—The correlation coefficient pr of the output 


(280) — (1 — ps)K(p,) — 5 ap al (0) =S Al (14) 


October 


from a linear detector plotted vs the correlation 
coefficient ps of the undetected signals, for the 
special case g = 0 and a1 = a2 = a. The para- 
meter a has the values, starting from the lower 
curve: 0, 0.1, 0.25, 0.5, 1, 2, 4, and oo, 


Pr~ 


From (13) and (15), we observe that for 
equal and very large values of the SNR’s 
Pr” exceeds pr by a factor 4. For this type 
of signal, the linear and the square-law 
detector behave quite differently. 


Ae (1 a) ie R + a),(2) + a(2) | 


(16) 


KJELL BLOTEKJAER 

Div. for Radar 
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Golay’s Complementary Series* 


Golay, in a recent paper,! introduced 
the notion of complementary sequences of 
7s and 1’s. He states as a result obtained 
by trial that complementary sequences of 
length 18 do not exist. We prove this result. 

In order to define complementary se- 
quences, let S Gr » Sn) be any 
sequence of 0’s and 1’s. We first define 


LS) = the number of values of 7 for 
which Sj = Sy, 
= the number of like pairs of 
s;s which are separated by 
distance 1, 
U,(S) = the number of values of j for 
which s; 4 sjus, 
= the number of unlike pairs 
of s;’s which are separated by 
distance 7. 


We note that 


EAS) + U,(S) 


Ps [In — 4 for lsisn-l1, 
eG, Ole 7 SS We 
Nowelety Ani (di) 72a, and 8. = 
(bi, --- , b,). We define equivalence: 
A=B 
if 
A) = 1B) 
and 
U{A) = U,(B) 


for all 7. We define complementary sequences: 


VA eS 


LA) UB) 


and 


U (A) = L,(B) 


for all 7. We see from the equation above 
that L(A) = U;(B) for allz implies A ~ B. 
We also see that = is an equivalence rela- 
tion, that ~ is symmetric, and that equi- 
valent sequences behave identically with 
respect to ~~. 

Golay proves many facts about comple- 
mentary series. In particular, he investigates 
the values of for which they can exist. 
His theorems settle the existence question 
for n < 18 and for many larger values as 
well. He asserts that ‘it has been verified 
by trial that complementary series do not 
exist for n = 18.’ The purpose of this note 
is to give a reasonably compact proof of 
this fact. 


* Heccived by the PGIT, January 5, 1961. 
le Golay, “Complementary 
IRE Eee on INFORMATION THEORY, vol. 
pp. 82-87; April, 1961. 


series,”’ 
IT-7, 


Correspondence 


We let 

Li(S) = LS) + La(S) + LoS) + --: 
= the number of like pairs whose 

- subscripts are congruent mod 1, 

US) US) ar Uai(S) ap Usi(S) a Oe 


the number of unlike pairs whose 
subscripts are congruent mod 1. 


For 1 Sj Si, we let 


PitS) = 85 + Sine + 8y425 +7? 
= the number of values of k such 
that k = j7Gnod 7) and s, = 1. 


Thus pi; = the number of 1’s in S, pu = 
the number of 1’s in odd positions, p22 = the 
number of 1’s in even positions, etc. Suppose 
that 7 |, and that k = n/1. Then it is easy 


to see that 
Lee een 
D8) = LD pulk = pid 


As oe cases, we see that 
L,(S) = the number of like pairs in S 


+09"). 


U,(S) = the number of unlike pairs in S 


a Piln = pi): 


Gree: 
g(p, k) = p(k — p). 


Now suppose that A and B are sequences 
of length 18 and that A ~ B. We wish to 
reach a contradiction. We first let 


For brevity, let 


I(p, *) 


I 


DNS 
Then we have (if z | 18) 


POSH ky) 
= DF 9(4s;5 18/2). 


This equation is our main tool. We shall 
successively let 1 = 1, 2, 3, 6. In the case 
7 = 1, we find that 


oe ae. = ps) - o 
es a ( 9 ar qi(l8 Grae 


Elementary algebra leads to 


== (18 == 1D a 


I 


Qu) 


as (in gts dies 
[This is Golay’s (7).] Thus 


18 are Pri = qi — oe 


(Om = Ohi, = eon 


Then {piu1, qi} = {6, 9} or {12, 9}, where 
the curly brackets indicate unordered 
sequences. We can arrange that qu = 9 
by interchanging the sequences used for 
A and B if necessary. Similarly, we can 
arrange that pi = 6 by “altering’’ the 
sequence used for A if necessary. We define 
alteration thus: 


0’ = 1, 


(s1, ak none = (sf, of a) 


the alteration of S is S’. Clearly, S = S’ 
and pii(S’) = n — piui(S). Thus if pir = 12, 
then by replacing the sequence used for A 
by its alteration, we still have a comple- 
mentary pair of sequences and pi: = 6. 

We have seen that without real loss of 
generality, we may assume that pi = 6 
and qi = 9. Next we wish to let 1 = 2. 
Obviously, we have 


l| 


P21 a P22 Ppie> 6, 


a1 a Oy == Ut = 9. 


We consider all possible cases in the follow- 
ing small tables: 


f(por, 9) + fr22, 9) = iby 


L; as L(A), (Oe = UB) 
(Vag = pi;(A), Ging = p:;(B). 
/ Pp I(p, 9) (Par, P22} 
0 36 @, © 
b 8 5, 1 
De I) 4,2 
3| 18 233 
4 16 
5 16 
6 18 
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q| 9(q, 9) {G21, oo} | g(gar, 9) + (gos, 9) = Us 
0 0 9,0 0 
1 8 Sa 16 
2} 14 ee 28 
Sle 18 6,3 36-V 
4| 20 5, 4 40 
9D 20 
6| 18 
7| 14 
8 8 
9 0 
Thus, L2 = U2 implies that {pe, poe} = {3, 3} and {qo 920} = 


{6, 3}. We can arrange that go = 6 and q2. = 3 by writing the 
sequence used for B backwards if necessary. It is clear that a 
sequence written backwards is equivalent to the same sequence 
written forward, and that this rearrangement of B interchanges the 


values of gai and qop. 


We now know that without real loss of generality, we may 


assume 
Pu = 46; 


di = 93 G21 
Next we take 7 = 3. Obviously 


Pai + P32 + P33 = 


31 Ss ds2 + Qs3 


= 


Furthermore, p3; and qs3; are 
include all possibilities: 


3, 


6, 22 


3, 


= 3. 


= 6, 


18/3 = 6. The following tables 


Ls 


45 
35 


p | fp, 6) {Ps1, Ds2» Das} 
0} 15 6, 0, 0 
1a a0 Bee 
ay 7 EON 0 
Br) 6 ake 
NN irs 3, 3,0 
5| 10 BhOnal 
6| 15 EOE) 
q| gq, 6) {Gs1, Gs2» Gas} 
0| 0 6, 3,0 
The Gel 
218 5, 4,0 
Sse pea al 
4| 8 52 
5| 5 44 
6) 0 Lee) 
3, 3,3 


21 I 
211 
25 


2011 


Using L; 
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(Pai, Pai, Das} 


| 
October 


Us, we find that the following two cases are possible: 


{s1; 32) Qss} 


I DED. Dhalh glean ae 
“44,1 | 
TT) ae Saou 
“i 3,3,0 


Similarly, in the case 1 = 6, we have 


Dd! Poi = Pu = 6, Di = un = 9, 


and ps6; and qe; are S$ 18/6 = 3. The following tables include all cases. 


P I(p, 3) (Per, ++, Doo} Ls 
Oo aes 330000 18 
Ik} 1 321000 14 
Z 1 311100 122 
3 3 222000 P10) 
221100 1056: 
211110 8B 
He TOLste sy) 6A 
q | g(@, 3) {qe1) ***) Goo} Uz 
0 0 333000 0 
1 2 332100 4 
Z Z, 331110 6A 
3 0 322200 6A 
322110 8B 
321111 10;€ 
222210 10°C 
222111 12°) 
Using Ls = Us, we see that there are four cases: 
(Poi, i Dees {ors sed) oo } 
A WO a 331110 
b 322200 
B 211110 322110 
¢ 221100 G32 10 
222210 
D a 311100 Zool 
by #2220000 
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Not let us arrange the pes; and the qe; into matrices: 


Correspondence 


Pei Pes. Pos Ye1 Yes Yo 
¢) 
Pos Por Poo Jos Yor Yo 
Tt is obvious that the row sums must be po = 3, po = 3 and 


G21 = 6, G22 = 3, respectively. It is also obvious that the column 
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sums, in some order, must be psi, P32, P33 aNd Gs1, Y32, 33, respectively. 
Thus combining the cases for the p3;, qg3; and the p6;, g6j, we get 


the following tables of all possible cases. We use 


to denote a 


column vector. We write ‘impossible (col) if it is impossible to 
arrange the 6 numbers (either the ps; or the q«;) so as to have the 
correct column sums. We write “impossible (row)’” if it 7s possible 
to make the column sums correct, but not simultaneously possible 


to make the row sums correct. 


ea a] 


? 


ea eace: 


763 
66 


IA As ae (1 TiAa impossible (col) 
J 
; (2 (2 (2 
IB impossible (row) oes 5 : 0 : a 
if : 0 
; (3) (3) 0 TiiAa ) ) c | ) | 
IC L 
0} ye 2 
TiAb impossible (col) 
IDa impossible (col) 
IDb impossible (row) : 3) ck | 
. iB 2} a| ) 1 
elas impossible (col) lip 3 2 (1) 
ThiA impossible (col) 2 i) 4 Die, 0) j 
IB impossible (col) iCa impossible (row) 
TiB impossible (col) iCb impossible (col) 
hhiCa impossible (col) 
- = ; 5 
lic 3 lob (?| liiCb impossible (row) 
thi 2) 1 : lib impossible (col) 
ic 1} 2]? lo IiD . impossible (col) 
? i | IlAa impossible (col) 
: Wy? 2 1 IIAb impossible (col) 
IhDa ‘al 3] 1) 
ee ee ea 8 
IhDb impossible (col) 
ThiiDa impossible (col) TICa impossible (col) 
ThiDb impossible (col) IICb impossible (col) 
(2 2 2 
mj, G) 


We now eliminate all cases which are impossible on either side and combine the tables 


into the following single table. 


oe aes Ee) 

1A (i) t} OG) | #G) tl 
AL ieee 

m | ia(t}, (2) (f) toon aobls 
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Now we are ready for our final step. 
Golay proves [see his (4)] that if A ~ B, 
then 


a; + On+1-i 6 b; = Oneie: 
— (ofo(ol tahbuanloyar = It or 3 


In our case, if we define h(z) in the following 
equation, we have 


h(t) = a; + ayo_; + 6, + ae 


= odd number. 


Therefore 
Par + Pos + dor + Gos = 
h(1) + h(7) + hU13) = odd number, 
Poo + Pos + qo2 + Gos = 
h(2) + h(8) + h(14) = odd number, 
Pos + Pos + Gos + Gos = 
h(3) + h(9) + AS) = 


In case IA in the last table, we must have 


odd number. 


Pei she xs = 2, Pe2 ar [Xi = 2, 


P63 =f (Ke = 2 


all even. Hence, the corresponding sums 
of the gs; must all be odd. But in the two 
possibilites for the qg’s in case IA, either. 
five of the qs; are even, or five are odd. 
Hence, only one of the sums 

Gos + Qos 


Ge1 + Qos, doz + Qos, 
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can be odd. Thus, case IA is impossible. 
In case IID in the last table, we must have 


Yo2 Ae des = 3, 
dos + Qos = 3, 


all odd. Hence, the corresponding sums of 
the ps; must all be even. Now each of 
the sums 


61 =a des = 3, 


Der + Des, Dez + Pos, Dos + Pos; 


involves one term from the top of a vector 
and one from the bottom. Consequently, 
it is easy to see (for both possibilities in 
ease IID) that these sums in some order 
are {4, 1, 1}, and hence are not all even. 
This completes the proof that it is impos- 
sible to have complementary series of 
length 18. 


JosprH B. KrusKAL 
Bell Telephone Labs., Inc. 
Murray Hill, N. J. 


On ‘‘Upper Bounds for Error Detec- 
ting and Correcting Codes of Finite 
Length’’* 


In a recent article,| Wax cited the results 
of Laemmel that the best value for the 
number of sequences of length 14 which 


* Received by the PGIT, January 6, 1961. 

1 N. Wax, “‘On upper bounds for error detecting 
and correcting codes of finite length,’ IRE Trans- 
ACTIONS ON INFORMATION THEORY, vol. IT-5, 
pp. 168-174; December, 1959. 


correct two errors is ‘48(?).’’ A better 
value is 64. To see this, consider a code? 
developed by Bose and Chaudhuri of length 
15 with 7 information bits which is able to 
correct 2 errors. Leaving off one information 
bit, this would be a code of length 14 with 
6 information bits, or 64 code points. 

There are several entries in Table I of 
Wax’s paper which are blank. A MacDonald? 
maximum-minimum distance code will give 
8 code points with length 17 capable of 
correcting four errors. There is also an 
optimum group code with 29 = 512 code 
points of length 17 that is capable of 
correcting 2 errors. 

These results are consistent with the 
bounds Wax found. It should be noted 
that these results appeared after Wax’s 
paper was published. 


F. F. SELLERS 
Reliability Technology 
IBM Product Dev. Lab. 
Poughkeepsie, N. Y. 


2R. C. Bose and D. K. Ray-Chaudhuri, ‘ 

class of error correcting binary group codes,” In- 

Ba a ee and Control, vol. 3, pp. 68-79; March, 1960. 

MacDonald, “Design methods for maxi- 

pan eee distance error detecting and correct- 

ing codes,” IBM J. Res. & Dev., vol. 4, pp. 43-57; 
TENSE 1960. 

W. Peterson, ‘‘Error Detecting and Correct- 
ing Codes,” Technology Press and John Wiley and 
Sons, Inc., New York, N. Y.; 1961. A table of opti- 
mum group codes is given in Chapter 5 of the book. 


| 
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CORRECTIONS 


Phillip Bello, author of ‘On the Approach of'a Filtered 
Pulse Train to a Stationary Gaussian Process,’’ which 
appeared on pp. 144-149 of the July, 1961, issue of these 
Transactions, has called the following to the attention 
of the Editor. 

On p. 149, first column, (58) should read: 


M4 

al 
3* 
=o 

t 


N 
Naar 


p,qa=1 


(58) 


2 N 


SG ay ae ine Oa oie 


Tie =\00 p,q=1 m=—o 


N 


> &G - 


p,q=1 


ta)Ap\q = O 


On the same page, in the line following (56), J, should 
be replaced by script 3,. Similarly, the J, in (59) should 
be script 3,. 


Carl W. Helstrom, author of ‘“‘Maximum-Weight Group 
Codes for the Balanced M-ary Channel,’ which appeared 
on pp. 550-555 of the December, 1960, issue of these 
TRANSACTIONS, has called the following to the attention 
of the Editor. 

After the author returned the proofs of the above paper, 
Tables II, III, and IV were altered without his knowledge 
into a form that may be misunderstood. In Table II, 
fork = 4,8 <h < 14, “1, 2, 8, 4, 124, 184, 234” should 
have been a ditto mark. The same holds in Tables III 


and IV for “01” for k = 2) h > Aly and tors 001s tor 
K=O alee als 

To find the proper columns of the MRT for any value 
of h, the column whose label appears after the + sign 
should be appended to the set of columns used for h — 1. 
For example, the labels of the columns of the MRT to 
be used fora. code with M = 4) kh = 3, hi =soare005 
010, 100, 111, 16a. 

The weights w; given in Table III for M = 3,k = 3, 
h = 2:shouldireads 0-2 


D. C. Youla, author of “On the Factorization of 
Rational Matrices,’’ which appeared on pp. 172-189 of 
the July, 1961, issue of these Transactions, has called 
the following to the attention of the Editor. 

On p. 172, second column, line 24 from the top: 


“b,,(p)” should read “‘f,.(p).” 


On p. 180, first column, second line above (89): 
“20, = 2, should read’ 220, == 20, =="20,, =e 
On p. 188, first column, in (179): 
“Wi! (p)” should read “W,(p).” 


On p. 188, first column, in the second line above (180): 
“VW(p)” should read “‘G(p).” 


On p. 189, first column, in line 12 from the top: 
“(178)” should read ‘(179).”’ 
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An Analog Pulse Correlator—T. Bartokowski and J. Seidler (in 
Polish). (Zeszyty Naukowe Politechniki Gdanskiej, no. 20, pp. 
49-65; 1960.) 

A description is given of an analog pulse correlator for the band 
extending from 50 cps to 20 ke. The method applied is based on the 
sampling principle. Two time-shifted samples modulate the height 
and width of pulses, and a voltage proportional to the correlation 
function is obtained by integration of the train of such doubly 
modulated pulses. The equipment is suitable for measuring both 
auto- and cross-correlation functions. 


Frequency-Time Transposition for the Measurement of an Un- 
known Frequency, II—R. H. Baumann (in French). (Ann. de 
Radioélec., vol. 16, pp. 69-92; January, 1961.) 


In Part I of this investigation (see July, 1961, Abstracts), it was 
shown that the circulating sinusoidal signals in a closed-loop system, 
which consists of a delay line and a modulator, are transformed 
into impulses whose time shift is a measure of the unknown Doppler 
frequency to be determined. 

In Part II, the system limitations are considered; more specifically, 
an imperfect system with spurious signals in the delay line and in 
the modulator is treated theoretically. This analysis provides a 
good explanation of the anamolous experimental results described 
in Part I. 

In conclusion, several methods are proposed for improving the 
system stability and for minimizing the undesirable effect of spurious 
signals. The maximum number of recirculations obtained experi- 
mentally with one of these methods was 400, which corresponds to 
a twenty-fold improvement in the signal-to-noise amplitude ratio, 
and to the determination of the Doppler frequency to within 
0.5 per cent. 


Sequential Signal Detector, Rayleigh Case—J. W. Caspers (in 
English). (Navy Electronics Lab., San Diego, Calif., Rept. No. 730; 
November 7, 1956.) 

A detector based on the sequential probability ratio test has 
been examined theoretically. It is shown that it will respond to 
signals more rapidly, on the average, than the best nonsequential 
detector. Equations and curves are given which predict the per- 
formance of the sequential detector and compare it with that 
of the Neyman-Pearson detector. Further analyses and experi- 
mentation are outlined which are directed toward a more compre- 
hensive understanding of the sequential detector, toward its 
simplification, and toward an estimation of its value as a SNR 
estimator. 
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An Experimental Sequential Detector—G. M. Dillard and R. E. 
Simmons (in English). (Navy Electronics Lab., San Diego, Calif., 
Rept. No. 999; November 8, 1960.) 

A binomial sequential detector has been developed to facilitate 
laboratory experiments in the development of radar detection 
techniques based on statistical methods. The system has been 
found to be capable of performance comparable with predictions 
made by exact equations and certain approximations valid for 
large sample cases. A device for generating a random sequence 
of ones and zeros, with the probability of a one fixed, has also been 
built. Extensive laboratory experiments employing both equipments 
are described. 


Television Band Compression by Contour Interpolation—D. Gabor 
and P. C. J. Hill (in English). (Proc. THE., vol. 108, pt. B, pp. 
303-315; May, 1961.) 


It is proposed to economize bandwidth by transmitting a pro- 
portion only of the scanning lines in the picture: in the simplest 
case alternate lines would be transmitted, but in more extreme 
applications only one in four or one in eight. The missing lines 
are to be reconstructed in the receiver by a process of interpolation 
which is sensitive to contours (edges) and inserts the transition 
in the interpolated scanning line in such a way as to maintain 
straightness of contour. Contours are identified by means of a 
differentiating circuit, followed by further gating and clipping 
circuits to produce a sharp pulse at the leading edge of each contour. 
If a, b, and c¢ are successive lines, then a and c must be scanned 
simultaneously so that b can be interpolated, and this requires 
that a be stored until the whole of ¢ has been scanned. When the 
scanning spot on one line reaches a contour it stops, and the scanning 
spot on the other line is caused to move at double velocity until it 
also reaches the contour. The scanning velocity for the interpolated 
line is the mean of the two. The brightness of the interpolated line 
is fixed by weighting the brightness of each of the adjacent lines 
with the velocity of its scanning spot and taking the mean. Thus, 
when both spots are moving, the brightness of the interpolated 
line is the mean of those of adjacent lines, but when one spot stops 
the brightness of the interpolated line is that of the spot which is 
still moving, z.e., of the picture before the edge. The system was 
tried out on a slow-speed model similar to a facsimile system, with 
transmission of a photograph and reception on photographic paper. 
The velocity modulation of the spot was carried out mechanically 
and controlled by a pilot spot on each of the pair of lines scanning 
ahead of the transmitting spots. 

A scheme is proposed for an electronic realization of such a 
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system. At the transmitter, the picture is to be produced on a 
double-beam cathode-ray tube and transferred to a camera tube. 
The picture may be read out two lines at a time from a single-beam 
camera tube by oscillating the beam between two lines; and the 
change of sweep velocity required on encountering a contour can 
be superimposed on the movement of the beam. The receiver for 
a 2:1 compression system would require two storage tubes; for 8:1 
compression the receiver would require six camera tubes and six 
cathode-ray tubes of which one would be double-beam and two 
would have variable-velocity scanning. Line-by-line rate equalization 
methods should give a compression of 3:1 and by combining this with 
a compression of either 4:1 or 8:1 by means of interpolation, the 
over-all compression should be either 12:1 or 24:1. 


Waiting System With Full Availability Where the Holding Time 
of the Incoming Trunk is Larger Than That of the Outgoing Trunk— 
E. Gambe (in Japanese). (J. Inst. Elec. Commun. Engrs. (Japan), 
vol. 44, pp. 227-233; February, 1961.) 

Probabilities and other related formulas of various simultaneous 
connections of a waiting system with full availability, where the 
holding time of incoming trunk is longer than that of outgoing 
trunk, are derived for the stationary state under the assumption 
that the oecurrence of calls is Poisson distributed and the holding 
time is exponentially distributed. A method of calculation of these 
for the multistage waiting system by applying known formulas is 
discussed, and it is shown that the numerical tables and curves for 
an ordinary waiting system with full availability are applicable to 
the multistage waiting system under a certain constraint on the 
calculation of calls. 


On the General Definition of the Amount of Information—I. M. 
Gel’fand, A. N. Kolmogorov, and A. M. Yaglom (in Russian). 
(Doklady Akad. Nauk S.S.S.R., vol. 111, no. 4, pp. 745-748; 1956.) 


The system $ of “random events’? A, B, C,...is supposed to 
be a Boolean algebra over which a normed, non-negative function 
P(A) is defined. P(A) is additive for nonoverlapping A, B. An 
“experiment”? is identified with a subalgebra of the algebra S. 
Otherwise speaking, a subalgebra A consists of all events whose 
results become known after the end of the experiment. If sub- 
algebras A and L are finite, then the amount of information contained 
in the results of experiment A about the results of experiment L is 
given by Shannon’s formula 


I(A,t) = DO PCAs, B,) log a ey 


In the general case, it is natural to put J(A, L) = sup4:S47b:st 


I(Ai, li), where Ai, respectively Li, are finite subalgebras of- A, 
respectively L. Thus, the defined amount of information has the 
following known properties. 1) J(A, L) = J(L, A). 2) T(A L,) = 0 
if, and only if, A and L are independent. 3) If the smallest sub- 
algebras, [Ai U Li], [Az U Le], which contain A; U Ly and A» U Le are 
independent, then J([Ai U Ao], [li U bLe]) = Z(Ai, Li) + IAs, Le). 
4) If Ai © As, then (Ai, L) = I(Ag, L). 

The following theorems are considered to be almost obvious. 
Theorem 1: if the algebra A; © A is everywhere dense over A in 
the sense of the metric p(A, B) = P(AB’ U A’B), then I(Ai, L) = 
KON, WO, Sora, Ze Ibi (Ny SVN (SG coe hs S soe gynel (NE Wh /\,. 
then I(A, L) = limyoo I(An, L). Theorem 3: if the sequence of dis- 
tributions P” converges on [A U L] towards P, then for corresponding 
amounts of information, lim, .. inf J"(A, L) = I(A, L). 

Now let S be a o-algebra and let all distributions P(A) be o-addi- 
tive. Let X denote ‘a measurable space” and let €*(A) denote an 
event that ed. One may consider a random element éX as a 
homomorphic mapping £*(4) = B of a Boolean algebra Sx into 
the Boolean algebra S. The subalgebra of S obtained under the 
mapping &* of Sy will be denoted by S;. It is natural to assume 
I(é, 1) = I(S:, 8,). 

Let X X Y bea measurable space. The condition (£, 7)*(A & B) = 
&*(A )E*(B) defines the homomorphism of the algebra Sx. y into S. 
This defines in turn a pair (£, 7) as a random element of the space 
X X Y. The formula P;(A) = P[é*(A)] gives a measure in X. 
If a measure on the product space X X Y is also given, then accord- 
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ing to the Radom-Nicodym theorem P:,(C) = Sfoa(x, y) dP; dP, + 
S(C) where the measure S is singular with respect to Pe K P. 

Theorem 4: I(é, 7) is finite only if S(C) = 0 and we have then 
I(é, 1) = SJxxva(a, y) log a(z, y) dP; dP,. 


Suppose now that X and Y are complete metric spaces. Theorem ‘ 


5: if for random elements eX and 7eY the distributions P:,,“ 
converge weakly towards the distribution P(g, 7), then for cor- 
responding amounts of information, lim, inf 1*(g, ») = I(é, 7). 


Error Statistics Utilizing the Code Translation Data System over 
Various Media—B. J. Hofmann (in English). Lincoln Lab., M.I.T., 
Lexington, Mass., Rept. No. 25G0026; March 28, 1961.) 


In order to obtain the comparative characteristics of the Lincoln 


Laboratory code translation data system (CTDS) on various digital , 


data transmission media, a series of tests was run from October, 
1959, through August, 1960, over six private line loop circuits. Three 
of these were K-carrier circuits; two were microwave TD-2 (L-3 
carrier) circuits; and one was an H-44 cable. 

It is shown that the average error rate for both the K-carrier cir- 
cuits and the H-44 cable is about 1 bit error in 10°, and a magnitude 
higher for the microwave circuits. The number of bit errors per word 
error varies from below 2 to over 6, the higher number being pre- 
dominant. The temporal distribution of errors is shown to be of a 
burst nature interspersed with long quiet intervals. 


A Note on Optimum Linear Multivariable Filters—R. J. Kavanagh 
(in English). IEE Monograph No. 439 M; April, 1961.) 

A multivariable system is described by a matrix of power spectra 
which specifies the spectral distribution of signals from each variable 
and the correlations between variables. One problem is to specify 
a physically realizable filter which will transform this set of spectra 
into another set in which outputs from individual variables are both 
uncorrelated and have white-noise spectral distributions. An explicit 
solution can be found in three stages. First, a multiplier A(p) will 
be found which transforms the original matrix into a diagonal 


matrix corresponding to the elimination of cross-correlations. Then — 


the matrix is to be further modified by a multpher B(p) so that | 


the individual spectra will be of white-noise form. The product 
B(p)A(p) will not in general correspond to the transfer function 
of a physically realizable system, so it is to be multiphed by C(p) 
which has zeros placed so as to cancel all right-half p-plane poles 
of B(p)A(p). The final form D(p) = C(p)B(p)A(p) is physically 
realizable, but usually includes nonminimum-phase elements. The 
converse problem, of converting a set of incoherent white-noise 
signals into a set having a desired matrix of power spectra, can be 
accomplished by the same technique. A two-variable example is 
given. 


On an Approximation to the Correlation Function of Passive 
Disturbances—J. Kulikowski (in Polish). (Prace Przemyslowego 
Instytutu Telekomunikacji, no. 31, pp. 29-32; 1960.) 


The Fourier spectrum of an autocorrelation function may be 
approximated by means of a rational function. This leads in turn to 
the possibility of a useful approximation of the autocorrelation 
function itself. 


A Description of the Statistical Properties of Phase by Means of 
Periodically-Normal Functions—J. Kulikowski (in Polish). (Prace 
Przemyslowego Instytutu Telekomunikacji, no. 31, pp. 33-37; 1960.) 


The density functions of phase are in many cases too complicated 
to be immediately used in practical computations. It is shown that 
the approximations to these densities by means of periodically- 
normal functions leads to substantial simplifications. 


Synthesis and Perception of Japanese Fricative Sounds—K. Nakata 
and Y. Kadokawa (in Japanese). (J. Inst. Elec. Commun. Engrs. 
(Japan), vol. 44, pp. 221-227; February, 1961.) 

Experiments on the synthesis of Japanese fricative sounds are 
carried out and their results are given in this article. 1) For unvoiced 
fricative sounds, the frequency characteristics of the noise spectra, 
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the relative intensity of the consonantal part by noise to the following 
vowel by buzz, and the frequency locus of the second formant of 
their transitional parts are studied. 2) For voiced fricative sounds, 
the transitional characteristics of the first formant and the time 
relationship of hiss and buzz sources are considered as the factors 
of voicing, and their effects on voicing of fricative sounds are studied. 

The features of Japanese fricative sounds are discussed, comparing 
the results of their synthesis to English fricative sounds. A new 
method of synthesis is applied for the [h] sound which has a higher 
audibility than by any other method; the characteristics of this 
method are given in detail. 


A Consideration of P-Nary Codes—S. Noguchi, et al. (in Japanese). 
(J. Inst. Elec. Commun. Engrs. (Japan), vol. 44, pp. 205-211; 
February, 1961.) 

Fundamental problems of P-nary codes are considered here. 
By introducing measures in the code space, a concept of distance 
between two codes is defined and the relationship between the 
distance and its physical meanings, which is represented here by 
transition probabilities, is derived. 

A set of codes with distance greater than a specified value is 
composed in the code space spanned by the distance of codes from 
the viewpoint of the theory of groups, and the number of the codes 
is calculated. When P = 2, this number coincides with that of 
Muller. 


Small Signal Detection through Binomial Sequential Analysis— 
C. Nuese (in English). (Navy Electronics Lab., San Diego, Callif., 
Rept. No. 766; March 25, 1957.) 

A method for radar detection by sequential analysis through 
Bernoulli trials of Rayleigh-distributed data is described. This 
method requires less delicate circuits than methods using the 
complete data, at a cost of increasing the average sample size by 
55 per cent. An optimum voltage quantizing level is computed 
and curves are presented showing the average sample sizes required. 
A digital computer is proposed to perform this test. 


On Certain Problems Concerning the Detection of Weak Signals— 
B. Picinbono (in French). (Ann. des Telecommun., vol. 16, pp. 2-27; 
January-February, 1961.) 


In the first part, the author describes the realization of an ex- 
perimental method designed to measure the one-dimensional 
probability density of a stationary noise voltage. This method is 
used to study probability curves which are difficult to obtain by 
computation, as well as to show experimentally the tendency 
toward a Gaussian law caused by selective filtering of non-Gaussian 
noise. This problem is studied from a theoretical viewpoint in the 
second part, where a certain number of necessary and sufficient 
conditions are given. Finally, the last part is devoted to the study 
of the effect of clipping on the detection of weak signals by correlation 
techniques. 


Signal Processing in Radar Astronomy-Communication via Fluctuat- 
ing Multipath Media—R. Price and P. E. Green, Jr. (in English). 
Lincoln Lab., (M.I.T., Lexington, Mass., Tech. Rept. No. 234; 
October 6, 1960.) 


Measurement of, and effective detection of or communication 
via, a propagation medium (such as is encountered in reflection 
from a deep fluctuating radar target, or in scatter-communications 
or underwater sound) that has significant multipath spread and/or 
fluctuation rate is studied in terms of the scattering function which 
describes how the medium redistributes the transmitted power in 
time and frequency. 

In measurement, matched-filter detection is employed, the 
ambiguity function (Woodward-Ville time-and-frequency correlation 
function) of the transmission giving a “window” through which 
the scattering function is observed. The scattering function is related 
to physical and geometrical properties of deep fluctuating radar 
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targets, and the extent of its time-frequency spread is shown to 
affect its measurability. 

The detection-communication analysis assumes that at the 
receiver input there is sufficient white noise that a low SNR prevails 
within the bandwidth of the signal that arrives via the propagating 
medium. Optimum detection, and detection using a simple filter, 
are considered, the former having several alternate circuit realiza- 
tions. For a medium having both multipath spread and fluctuation, 
a RAKE radiometer receiver is derived for the optimum detector, 
which is an iteration of a weighted radiometer; the latter structure is ~ 
an optimum detector for a single-path fluctuating medium, and 
combines radar sweep integration with radiometry. Optimum 
fixed-energy transmissions are found for certain scattering functions, 
and are shown to be “low-TW” and to have a duration that is 
the geometric mean of the multipath spread and the fluctuation 
period. For media of small time-frequency spread the simple filter- 
detector works nearly as well as the optimum detector, but the 
effectiveness of both decreases as the spread increases, the filter- 
detector becoming steadily worse relative to the optimum detector. 

Some actual radar experiments on the Sun and Venus, relating to 
the preceding analysis, are briefly described. 


Semi-Coherent Detection—I. S. Reed (in English). (The RAND 
Corp., Santa Monica, Calif., Rept. No. P-2106; September 19, 1960.) 


In this paper a method is presented for encoding “and trans- 
mitting’’ messages over an RF channel which could be shifted in 
frequency by a Doppler frequency. The receiver-detection process 
described herein avoids the necessity of a precise knowledge of the 
Doppler shift; it is only necessary to know the frequency band in 
which the shift lies. Thus, with this process, called semicoherent 
detection, the need for either a filter bank of matched filters or a 
Doppler tracking filter is avoided. In view of the receiver simplicity 
this communications technique could well have application in 
communicating to satellites and space probes. Its detection sen- 
sitivity is comparable tc square-law detection over the same time- 
bandwidth product. 


The Design of an ‘‘Error-Free’’ Data Transmission System for 
Telephone Circuits—B. Reiffen, et al. (in English). Lincoln Lab., 
M.1.T., Lexington, Mass., Rept. No. 25G-0029; December 22, 1960.) 


Recent experimental results verify that easily implemented codes 
can detect essentially all errors occurring in digital data sent over 
toll grade telephone circuits. A two-way communications system is 
described which uses these codes to detect the occurrence of errors 
and requests a retransmission of any data in error. The feedback 
logic is described in detail and the buffer required to adapt the 
system to various data sources is discussed. Extrapolated experi- 
mental results indicate that the system will deliver data in each 
direction at an average rate of approximately 7/8 the modem bit 
rate with a mean time to error of several hundreds of years. The 
philosophy that guided the design of this system can be applied to 
other media where high noise bursts or low signal levels occur 
infrequently. 


Sequential Detection Statistics—R. E. Simmons and R. A. Worley 
(in English). (Navy Electronics Lab., San Diego, Calif., Rept. 
No. 963; April 4, 1960.) 

The sequential probability ratio test (detection process) was 
investigated under the assumption of Rayleigh distributed signal 
and noise; both an optimum sequential test and a sequential test 
on binomially quantized observations were considered. For the 
case of a small sample size, methods are given for finding the statis- 
tical moments and distribution of sample size (detection time) and 
the error probabilities (false alarm and miss probabilities). Results 
of three methods of prediction of detector preformance are compared. 
Methods are indicated for predicting the behavior of a sequential 
test modified by forced termination an a sequential test performed 
with the target present during only part of the test. 
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Biological and Artificial Intelligence—D. I. Sweitzer (in English). 
(Jet Propulsion Lab., Pasadena, Calif., Literature Search 254; 
December, 1960.) 

During recent years an interest has been generated in the possi- 
bility of constructing a machine to simulate throught processes, 
even to the extent of making independent decisions on the basis 
of sensory data coupled with programmed or learned experience. 
A compilation has been made of references dealing with biological 
. intelligence, including perception, learning and decision making, 
and also the simulation of these abilities. The following sources 
were consulted: Armed Services Technical Information Agency, 
Technical Abstract Bulletins, through March, 1960; Aero/Space 
Engineering, 1958-1959; Computers and Automation, 1957-February, 
1960; IRE Procrgrpines, 1959; Journal of Symbolic Logic, 1936-1941, 
1958—June, 1959; Psychological Abstracts, January, 1954—October, 
1959; JPL Library Additions and files; and miscellaneous periodicals. 

The material is divided into eighteen sections, as follows: feasi- 
bility of simulating thought processes by machine; general automata; 
models and theories of general thinking processes and behavior; 
intelligence testing; models and theories of nerve transmission; 
electrical study of the brain and its functions; physiology and 
anatomy of the brain; machine perception-mapping; the perceptron; 
automatic translators; models and theories of perception; machine 
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learning and memory; models and theories of learning; models 
and theories of memory or recall; decision making by machine; 
models and theories of decision making; information theory; and 
man-computer symbiosis. An author index is also given. 


The Principle of Causality and the Second Principle of Thermo- 
dynamics—J. P. Terletsky (in French). (J. de Physique et le Radium, 
vol. 21, pp. 681-684; October, 1960.) 


It is shown that the most reasonable interpretation of the causality 
principle is to consider it as a consequence of the second principle 
of thermodynamics. It is considered, according to information 
theory, that a signal is a localized perturbation, transporting 
negentropy. In this case, the second principle and the invariance 
interval of the universe prohibit signals faster than light. However, 
localized perturbations, transporting energy faster than light, can 
still exist, if they do not transport negentropy, that is, if they are 
statistical fluctuations. Thus, particles of imaginary mass, moving 
with greater speed than light, can be admitted as physical realities, 
but the process of emission or absorption can only have the char- 
acteristics of a fluctuation, and arise without any systematical 
change of the entropy of the emitting or absorbing body. 


The following papers were published singly by the Professional Growp on Information 
Theory (1) and the Professional Growp on Automata and Automatic Control (A) of 


the 


Institute of Electrical Communication Engineers of Japan, 


2-8, Fujimicho, 


Chiyodaku, Tokyo, Japan. All are in Japanese; English abstracts are given when avarlable. 


Information Recognition and Connection Matrices (I; February 
24, 1961)—H. Enomoto. 


When information symbols belonging to an information source 
are transmitted to a receiver, transitions between symbols are 
frequently introduced by various disturbances.The statistical nature 
of the transition relation can be represented by a transition prob- 
ability matrix. In this paper, the topological structure which fixes 
the transition relation between information symbols is discussed 
by using a multihole torus and a transformation group. The common 
properties of some of the information symbols are in close relation 
with a closed path circulating around some holes of torus. Because 
the normal subgroup of the transformation group has many in- 
teresting characteristics, a systematic recognition method is obtained. 


The Theory of the Pattern Recognition (I; March 31, 1961)—N. 
Honda. 

In this paper a mathematical method of simplifying pattern 
recognition is described. It is assumed that by suitable treatments, 
patterns may be represented in an n-dimensional space I whose 
coordinates have only two values, 0 or 1. We have a set of con- 
ceptional letters and each conceptional letter corresponds to a 
point-set in the T space. We assume that these sets of points are 
given in advance. If all pairs of sets in the T space which correspond 
to different conceptional letters have no common subset, the space 
is said to be separable. 

When we receive a pattern, the process of the recognition is to 
decide on the conceptional letter corresponding to the received 
pattern, using the given knowledge on sets of patterns. When the 
dimension n of the space is large, the above process is troublesome, 
and it is desired to reduce the dimensionality of the space. In 
general, it is possible to map the T space into another space I’, 
conserving separability, where the dimensionality of the latter 
space is smaller than that of the former. In this paper, the mapping 
functions are restricted to the linear functions of module 2. The 
conditions and methods of obtaining the optimal mapping functions 
are considered. 


Application of Miyakawa’s Multidimensional Sampling Theorem, 
III (1; February 24, 1961)—K. Sasakawa. 


On the Periodicity of the Time-Envelope of Unharmonic Tones in 
Pitch Sensation (I; March 31, 1961)—T. Sugimoto. 


The pitch of complex tones such as vowels is predicted to be in 
close correlation with the time-envelope pattern of the waves. This 
is verified by experiments on pitch and periodicity, using unharmonie 
tones. Two kinds of periodicity are found in the envelope of the 
unharmoniec tones. One, which appears more often than the other, 
is found to be coincident with pitch sensation. In this paper, this 
interesting character of the periodicity is derived by theoretically 
calculating the periodicty of the time-envelope of simple unharmonie 
waves. 


Mechanical Abstracting (A; March 16, 1961)—R. Tatenuma and 
N. Sugiura. 


Extraction of Pitch Signals in Voice (1; March 31, 1961)—H. 
Wakabayashi. 

Pitch frequency plays an important role in various voice process- 
ings. In any case, pitch frequency must be extracted from the 
uttered voice at the outset. This paper discusses the general idea 
of pitch extraction and presents two actual examples. Properties 
of the voice pitch and relations between the pitch and the waveform, 
which must be clarified beforehand, are also considered. 


On All-Purpose Turing Machines with Minimum Size (A; February 
16, 1961)—S. Watanabe. 
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The following papers appear in Volume 1 of the ‘‘Proceedings of the Symposium on Decision 
Theory and Applications to Electronic Equipment Development (May 10-11, 1960),’’ held 
at Rome Air Development Center, Griffiss Air Force Base, N. Y. The volume is published as 


RADC Rept. No. TR-60-70A and is obtainable at the Office of Technical Services, U. S. Dept. 
of Commerce, Washington, D. C. All papers are in English; authors’ affiliations are given below. 


An Introduction to Bayes Decision Procedures—N. Abramson 
(Stanford University, Calif.). 


An introduction to statistical decision theory, viewed as an 
extension of game theory, is given. The two main differences between 
a decision theory problem (or a statistical game and an ordinary 
game) are pointed out. From these considerations the three basic 
components of decision problems are obtained and discussed. These 
components are 1) the a priort information, 2) the decision criterion, 
and 3) the experiment. Examples are given to illustrate the methods 
of decision theory and to emphasize the importance of the three 
basic components. Changes in the decision problem produced by 
varying each of these three components are also illustrated. 


Signal Detection by Adaptive Filters—E. M. Glaser (Johns Hopkins 
University, Baltimore, Md.). 


Most of the work in communication engineering concerned with 
weak signal detection has been devoted to the study of the synthesis 
and performance of optimum, time-invariant filters and detectors, 
that is, to detection systems whose structure is fixed in a configura- 
tion which is optimum for the particular signal or class of signals 
that is to be received. The nature of the signal and the various 
statistical properties of the interfering noise need to be known to 
the designer before this class of detection system can be expected 
to yield any degree of satisfactory performance. The designer’s 
restriction to an invariant system is one of the reasons for this 
being so. Increasing attention is now being given to detection 
systems which are able to adapt their structure so as to be optimum 
for the particular detection problem of the moment. 

This paper describes a form of adaptive detection system suitable 
for the reception of a pulse signal whose waveform is fixed but is 
unknown at the receiver. The system is one which functions initially 
as an incoherent detector and, as it receives more and more pulses 
from a particular source, modifies its structure and optimizes its 
detection performance with respect to this signal. 


The Choice of Estimation Procedure and Loss Function for Radar 
System Synthesis—J. E. Keigler (Radio Corp. of America, Princeton, 
INS Jie)» 

In recent years, much effort has been devoted to applying the 
techniques of statistical decision theory to the radar situation, 
particularly to the problem of determining whether or not a target 
is present, that is, the detection problem. The more interesting and 
difficult problem of inferring the properties of the target from the 
noise-corrupted returned signal, termed parameter estimation in 
the language of decision theory, is now receiving its share of atten- 
tion. One method of estimation, the Bayes procedure utilizing the 
bounded absolute error loss function, is presented in this paper as 
the most appropriate for the majority of radar situations. 


The Search and Detection Efficiency of Surveillance and Com- 
munications Devices Using Sequential Probability Ratio Analysis— 
G. W. Preston (General Atronics Corps., Bala Cynwyd, Pa.) 


The present study was undertaken to determine the logical 
design—and detailed block diagram form—of the probability ratio 
sequential search radar and to compute its performance gain in 
effective transmitter power over nonsequential radars which are 
otherwise identical. In achieving the objectives of this program it 
proved necessary to extend the theory of Wald and subsequent 


workers to take account of two important practical factors: 1) group 
sampling and 2) detection losses. As a consequence of this the- 
oretical work, we are led to a noticeably different form of the de- 
tection apparatus, which luckily gives performance appreciably 
better than the so-called ‘‘Wald detector’ in which group sampling 
is not taken into account. When correctly applied, probability 
ratio sequential analysis characterizes the optimum search radar 
since it gives the maximum sensitivity for any specified average 
search rate and maximum average search rate for any specified 
sensitivity. 


A Sequential Multi-Decision Procedure—F. C. Reed (Planning 
Research Corp. 


This paper considers a sequential approach to the multidecision 
problem which requires the selection of one of a finite number k 
of hypotheses H;. By way of introduction, maximum likelihood 
and minimum average-risk fixed-sample multidecision problems are 
considered, and Wald’s classical sequential probability ratio test 
for testing a simple hypothesis H, against a single alternative 
H, is discussed. After this introduction, a generalized sequential 
probability ratio is defined and rules set up for the sequential 
acceptance of one of the & hypotheses H;. It is shown that when 
k = 2 this procedure reduces to the classical sequential test. The 
rules for acceptance effect the errors in decision and the expected 
sample size required for making a decision. By considering the 
sequential multidecision problem as a generalized random walk in 
k dimensions, one may construct the Markov matrix associated 
with the process, and compute the various errors in decision and the 
expected sample size. An error analysis is only feasible on a large- 
scale computer, but a numerical example illustrating the use of the 
technique is presented. 


The Decision Problem in Radar—L. S. Schwartz (New York 
University, N. Y.). 


Well-known decision tests for use in the detection of noisy signals 
in receiving systems depend on a priori probabilities and likelihood 
statistics. Without such knowledge decision thresholds cannot be 
validly prescribed and error probabilities computed. In the ususal 
treatment of detection, one assumes that the necessary information 
is available, thereby avoiding many difficulties of an a priori and 
predictive nature. It is the objective of this paper to examine the 
problem of detection for various degrees of deficiency in the knowl- 
edge considered necessary for use of customary methods and to 
indicate what can be done by other means. 

Inductive probability is shown to have significance for radar 
from the standpoint of reliability, and a method for implementing 
inductive systems which function efficiently with unknown and 
randomly varying channels statistics is presented. 


The Resolvability of Point Sources—P. Swerling (The RAND 
Corp., Santa Monica, Calif.) 

The problem investigated is the resolvability, in the presence 
of noise, of point sources which are separated in azimuth by less 
than the width of the main lobe of the gain pattern. The probability 
of correctly resolving two sources is derived for a particular class 
of decision methods as a function of the strengths of the sources, 
their angular separation and noise level. Numerical results are 
presented for a case which corresponds more accurately to optical 
or infrared devices than to radar. 
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Lectures on Communication System Theory—Elie J. Baghdady, 
Ed. (McGraw-Hill Book Co., Inc., New York, N. Y.; 1961.) 


“Tn the last analysis, radio communications systems are designed 
by ‘seat-of-the-pants’ engineering—let no forest of formulae in this 
or any other chapters of this book suggest otherwise.” 

This statement, taken from Chapter 2 of the book by D. G. 
Brennan, is largely true, and the resulting implications are worthy 
of some examination. This situation certainly merits comment, 
especially before reviewing a book that does indeed display “a 
forest of formulae’ in its 617 pages and whose title implies that 
the contents are concerned with communication systems. 

In the past two decades, much effort has been expended and 
some notable progress has been made in an area that some choose 
to describe as “statistical communication theory.”’ The works of 
Wiener and Shannon are invariably (and properly) given special 
mention in discussions of this nature, but there are certainly a 
large number of other people who have also made significant con- 
tributions to the over-all effort. In view of the attention, effort, 
talent, and results that have been associated with statistical com- 
munication theory, one could logically expect to find, at this rela- 
tively late date, communications system engineering to be heavily 
influenced by statistical communication theory. The least to be 
expected is that several of the key results from the theory have 
been adopted by the engineer with beneficial results in practical 
application. Unfortunately, neither of these events has transpired, 
nor is there any indication that the theory will influence the practice 
to any great extent in the near future. 

It is often said that good theory and good practice go hand in 
hand. How then do we account for this apparent void between the 
engineer and the theorist in communications work? That this void 
exists, there is little doubt; the problem has been discussed quietly 
but persistently for several years. If one chooses to listen to extreme 
positions in this matter, a rather poor opinion of both groups might 
result. The theorist has claimed that the engineer is a technical 
reactionary, unable or unwilling to accept or try new approaches, 
preferring the familiarity and relative safety of proven and generally 
accepted techniques. The theorist, on the other hand, has been 
accused of being disinterested in the real challenges which exist, 
preferring instead to invent problems for which he has or wishes to 
find solutions. Indeed, more than one engineer, after studying the 
theoretical literature in communications of recent vintage, has come 
away convinced that many of these writers are merely using com- 
munications as an excuse in order to display one type of mathe- 
matical proficiency or another. 

If forced to comment on such a debate, this writer would agree 
with both sides. In spite of the self-congratulatory writings one 
finds so often these days in both theoretical and applied communi- 
cations journals, the fact still remains that the communications 
systems we are building today are the same ones we were building 
20 years ago. The only real difference between the systems of today 
and those of two decades ago is components. Certainly there has 
been much progress in the theory of communications, but where 
are the applications? Where are the great ideas that stimulate 
action instead of mere discussion? Where are the men of revolution 
such as Marconi, DeForest, and Armstrong? 

Perhaps our trouble today is, in part, the very fact that communi- 
cations people tend to drift either into application or theory. The 
future may well belong to neither group, but rather to those in- 
dividuals who are able and willing to master both disciplines. The 
handbook engineer, while still quite useful, is not likely to provide 
the spark that will push us ahead. The strict theorist is all too prone 
toward an involvement with models and mathematics that may 
be elegant, interesting, and challenging, but still trivial (if not 
irrelevant) from a basic application point of view. 

A refreshing example of the results which may be obtained by 
combining creative thinking, theory, and engineering may be found 
in a recent publication! by Kotel’nikov. This brilliant Russian 


1V. A. Kotel’nikov, ‘‘The Theory of Optimum Noise Immunity,’ McGraw-Hill 
Book Co., Inc., New York, N. Y.; 1959. 
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scientist certainly knows his theory, but he never becomes obsessed 
by it. The most impressive thing about Kotel’nikoy’s book, to this 
writer, is the fact that the ultimate goal of all work, application, is 
never once lost from sight. If the Soviets have managed to give this 
man authority without turning him into an administrator, then we 
should not be surprised to find ourselves behind in communications 
as well as in rocketry. 

It would be pleasant to report that the book being reviewed 
continues the good work of Kotel’nikov in bringing theory and 
practice more closely together. Such is not the case; the book does 
more to demonstrate the current situation in communications than 
to alter it. Communication systems will still be designed by “‘seat- 
of-the-pants’’ engineering, the “forest of formulae” to the contrary. 
This is hardly a criticism but merely a statement of probable fact, 
for, when viewed as a report on the current status of communica- 
tions philosophy, the book succeeds rather well. One interesting 
attempt in the book, to reduce a popular theory to practice, ends 
in disaster but perhaps teaches a lesson. I refer here to Section III 
of Chapter 11 by R. M. Lerner. The capacity formula for a noisy 
channel is used in an attempt to contradict the engineering axiom 
concerning the exchange of data rate for an improved error rate. 
Lerner demonstrates an application of the formula by the use of a 
multisymbol alphabet of optimally-chosen waveforms. Of course, 
Lerner fails to reach the promised land of arbitrarily small error 
rate because of the rapidly diminishing return in power gain as 
the alphabet size becomes large. Admitting the obvious impracti- 
cality of continuing such an approach, Lerner then suggests that 
the symbol number can be limited and that further improvements 
may be obtained by a digital coding scheme. This certainly is no 
answer since digital coding, under the conditions assumed, repre- 
sents merely a less efficient extension of the multisymbol technique 
already in use. One can only conclude at this point that the old 
engineering axiom has survived rather well. Now, do we conclude 
that Lerner has failed or could it be that the theory has failed? 
Is it not somewhat presumptuous to assume that every theory has 
a useful application? Has it ever been shown that information 
theory, for example, has more than a vocabulary relationship to 
communications? Lerner declines to discuss questions such as 
these that his work very obviously suggests. We shall follow his 
wise example and return at once to the business of review. 

The subject book edited by Baghdady certainly reflects the wide 
range of talents which have been attracted to communications: 
work. The contents vary from application of Shannon’s information 
theory in feedback communications systems to an engineering 
comparison of active and passive satellite relay stations. The book 
is essentially a collection of papers resulting from the M.I.T. summer 
program (1959) on ‘‘Reliable Long-Range Radio Communication.’’ 
The 18 authors are recognized authorities in their respective special- 
ities, and the reader can appreciate the effort required to edit the 
contributed material into the 23 chapters which make up the book. 
Baghdady’s editing work is certainly adequate for publication 
purposes, and this is probably all the editing needed since most 
readers will refer to selected chapters rather than treat the book 
as an entity. For the most part, the chapters can be read individually 
without need for outside reference. 

In addition to editing, Baghdady has contributed a chapter on 
diversity techniques and one on analog modulation systems. The 
first of these represents a good treatment of a popular and important 
subject. Selection, maximal-ratio and equal-gain diversity systems 
are analyzed and discussed. The accompanying mathematics is 
adequate and, in addition, practical considerations are given a good 
exposition. His second chapter on analog modulation systems runs 
over 100 pages in length and would almost qualify as a book in 
itself. Linear modulation (AM and derived systems) is considered, 
but the main effort is reserved for exponential modulation systems 
and, more specifically, FM. FM, like most other nonlinear modula- 
tion techniques, involves rather difficult systems analysis problems. 
As a consequence, this chapter is not particularly easy to read. 
Writing on a subject such as this presents a real challenge, since a 
good measure of engineering judgement is required to supplement 
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direct analysis. It is always interesting to observe how good engineers 
attack problems for which exact analyses are impossible or im- 
practical. Thus, the reader who is not particularly concerned with 
FM can study this chapter as a demonstration of engineering 
method. Those who are concerned with FM problems, especially 
with regard to some of the new detection techniques and the claimed 
improvements in threshold level, will want to give this material a 
careful reading. Baghdady’s claims for FM performance have 
aroused some controversy of late, which gives this chapter added 
interest. Regardless of the side one chooses to take in this matter, 
Baghdady must be credited with putting his ideas in print where 
all may examine them in detail. 

W. M. Siebert has contributed one chapter on linear, time-invariant 
signal processes and one on decision theory. The first of these covers 
familiar ground, such as linear networks, the sampling theorem, and 
band-limited, white, Gaussian noise, in a brief but professional 
manner. The second chapter by Siebert on decision theory represents 
an excellent exposition of the fundamentals of this approach. This 
chapter is highly recommended to engineers who are curious about 
this area of theoretical endeavor, but who have found difficulty in 
the past in obtaining readable material on the subject. A more 
general formulation of decision theory is given in a following chapter 
by W. L. Root. Both Siebert and Root give examples of applications 
of decision theory to communications problems. 

The representation and design of signals are topics covered in two 
chapters by R. M. Lerner. In much communications work, the form 
of the transmitted signal is usually predetermined by convention 
or regulation. The analysis work involved in such cases usually 
relates to the optimum recovery or decision-making process, given 
a set of transmitted waveforms. Lerner’s work is concerned with 
the more general problem of determining the transmission waveforms 
to be used for best performance in a given situation. This is a 
relatively new approach, and Lerner discusses some interesting 
problems. Multisymbol systems are treated for the case of a white, 
Gaussian channel disturbance, and the cases of impulse and other 
non-Gaussian disturbances are also considered. The non-Gaussian 
nature of noise found at the lower radio frequencies and on telephone 
lines presents some interesting systems problems, and a reading of 
Sections V and VI of Chapter 11 is especially recommended. 

The difficulties associated with the use of a time-varying trans- 
mission medium is a topic that receives attention in four chapters 
of the book by W. E. Morrow, Jr., H. Sherman, T. Kailath, and 
J. M. Wozencraft. For the most part, these writers are aware of the 
danger of taking mathematical models of the medium too seriously, 
and suitable cautions to the reader are mentioned. This reviewer 
feels that such qualification is most important since many media 
are not only statistical in nature, but are also nonstationary to a 
discouraging degree. 

The currently very popular subject of coding is represented in 
a chapter by P. Elias who gives some special consideration to 
erasures or null-zone detection techniques. In another chapter, 
P. E. Green, Jr., considers feedback communication systems from 
a strict Shannon sense. Predecision feedback is used and some 
interesting results are derived concerning the effect of the feedback 
link on channel capacity and possible system simplifications offered 
by the feedback link for a given level of performance. 

Receiving system noise considerations are well treated by R. P. 
Rafuse and R. H. Kingston, and an associated chapter by A. 
Uhlir, Jr., should be welcomed by those interested in a survey of 
semiconductor applications at microwave frequencies. Another fine 
bit of survey work may be found in the chapter by I. Pollack 
concerning the performance criteria of speech systems. 

The above review omits some of the content, but the reader 
should have a good idea of the wide variety of topics covered. 
Each chapter is followed by a list of pertinent references which 
appear quite useful. 
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JoHun P. Costas 
13 Edgewood Dr. 
Fayetteville, N. Y. 


An Introduction to Statistical Communication Theory—David 
Middleton (McGraw-Hill Book Co., Inc., New York, N. Y.; 1960. 
1070 pages + 20 index pages + xix pages + 7 bibliography pages 
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Some years back, an English statesman on a tour of the United 
States visited, among other places, Mt. Rushmore National Monu- 
ment. He is said to have stared at the four presidential faces, carved 
out of the mountain, for some time. Finally, he turned to his com- 
panions and exclaimed, ““Marvelous—but is it art?’’ We feel that 
we are in the same sort of predicament in trying to categorize 
Middleton’s mountainous treatise. There have been technical books 
with more pages than this one (though not many), and there have 
certainly been technical books covering a wider range of topics 
(though few have been at such an advanced level). Few technical 
books, however, can claim to be packed with such a weight of 
material as this one. 

This “introduction” to statistical communication theory has a 
distinctly nonintroductory set of prerequisites. The preface warns 
“little space is devoted to the probabilistic and statistical 
background, the elements of which the reader is assumed to possess. 
A knowledge of Fourier- and Laplace-transform methods, contour 
integration, matrices, simple integral equations, and the usual 
techniques of advanced calculus courses is also required, along with 
the elements of circuit theory and the principles of radio, radar, 
and other types of electronic communications systems.” 

If the preparations for this extended tour of statistical communi- 
cation theory are somewhat demanding for an introduction, it must 
be admitted that almost all the points of interest are covered. Some 
of the special topics omitted are mentioned in the author’s preface. 
These include feedback communication systems, coding theory, 
and sequential decision theory. 

We feel that the title is a misleading guide in trying to categorize 
this book. Middleton states three purposes for the book in his 
preface: 1) ‘‘to outline a systematic approach to the design of 
optimal communication systems ---”’, 2) to unify earlier work and 
3) to provide a text. Let us discuss the contents of the book in view 
of these purposes. 

Just as Mt. Rushmore, Middleton is divided into four parts. The 
first, “An Introduction to Statistical Communication Theory,” is 
certainly the weakest. This part consists of a good deal of intro- 
ductory material of an expository nature. The basic concepts 
involved in random variables, random processes, expectations, 
linear and nonlinear systems, spectra, correlation, sampling and 
SNR are treated. Most of this material is covered in a less complete 
but more illuminating fashion in Davenport and Root. There is 
an encyclopedic quality to this book which makes itself felt primarily 
in Part I. Seventy-five interminable pages are spent in various 
definitions of (omitting some duplicating terms) the autocorrelation 
function, the autovariance function, the autocorrelation coefficient, 
the autovariance, the intensity, the intensity spectrum, the average 
power density, the intensity density, the spectral intensity distribu- 
tion, and the spectral density for random, deterministic and “‘mixed”’ 
processes. Several types of Wiener-Khintchine theorems are dis- 
cussed in these seventy-five pages but the important topic of 
estimation of the spectral density is left for later. 

Chapter 4 which deals with sampling, interpolation and random 
pulse trains provides an unfortunate example of a heavy-handed 
treatment of a subject which deserves better. This chapter contains 
a large number of results on the spectra of random pulse trains, 
many of which we have not seen in print before. It does not, however, 
provide the reader with the insight demanded in such expository 
material. Middleton has always been a master at solving problems 
and it is this talent which he applies in Chapter 4. A book of this 
sort, however, has a higher responsibility than that of providing 
answers to questions and of solving problems. It has the responsi- 
bility of providing insight to the class of problems treated. It has 
the responsibility of examining the answer and asking, ““why?”’ This 
last question is rarely asked and this is a disappointment in Chapter 
4, as well as in the rest of the book. The last chapter of Part I 
presents a 40-page introduction to information theory. Shannon’s 
two fundamental theorems are stated but not proved in this chapter. 
In general, the material compares unfavorably with roughly equiv- 
alent material in Feinstein. 

Part II, ‘Random Noise Processes,” starts with a discussion 
of the normal random process and processes derived from the normal. 
The book may be said to come alive with Chapter 9, ‘Processes 
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Derived From the Normal.” Properties of normal narrow-band 
signals and noise are derived together with those of signals and 
broad-band normal noise processes. A few results on zero crossings 
and extrema of a random process are also presented. Many of the 
results of this well-organized chapter are presented in a large collec- 
tion of problems which amplify the results in the book. 

Chapters 10 and 11 deal with the physical origins of the random 
processes treated in the rest of the book. The Langevin, Fokker- 
Planck and Boltzmann equations are examined. Finally, models 
of thermal, shot and even impulse noise are formulated and various 
properties of these models are obtained. A great many results from 
a wide variety of sources are combined into the most readable 
treatment of noise generation we have yet seen. 

Part III, entitled “‘Applications to Special Systems,’ is devoted 
to modulation, demodulation and Wiener theory. Much of Part III 
is taken from Middleton’s earlier papers in modulation and demodu- 
lation theory. Chapters 12 and 14 provide an extensive treatment 
of the second-order statistics of amplitude-modulated and frequency- 
modulated signals. Chapters 13 and 15 discuss in astounding detail 
the detecton of such signals in the presence of noise. These four 
chapters will undoubtedly constitute the standard work on such 
problems for many years. 

Chapter 16 treats some measurement problems, Wiener filters 
and matched filters. Chapter 17 covers a miscellany of distribution 
problems, some of which are needed in the last part of the book. 
The material of Chapter 17, perhaps because it is somewhat apart 
from the main stream of the book, does make interesting reading; 
e.g., did you know (page 746) that the distribution of the finite-time 
estimate of the spectral density of a Gaussian random process 
becomes a simple exponential as the observation time increases? 

Part IV, “A Statistical Theory of Reception,” is concerned with 
the application of statistical decision theory to communication 
problems. Chapter 18 serves to provide an introduction to statistical 
decision theory. Little selection of material has been used in this 
chapter and it seems unnecessarily involved for the purposes to 
which it is put. In the next two chapters, Middleton considers first 
the general theory of binary detection systems and then a wide 
variety of examples of such systems. The famous Middleton thres- 
hold binary detection expansion rears its ugly head at the end of 
Chapter 19 and most of Chapter 20 is, unfortunately, based on 
this material. 

The most basic problem in the application of statistical decision 
theory to the detection of continuous signals is without doubt the 
detection of a known signal in Gaussian noise. Not only is this 
problem important in its own right but it may be used as the starting 
point of almost all other signal detection problems for which a 
solution is known. One need only average the likelihood ratio for 
the known signal over the ensemble of possible signals in order to 
obtain these results. Middleton’s chapter on examples of binary 
detection systems would have made more sense if he had not deferred 
treatment of this basic problem until he had given more than fifty 
pages of examples based on his threshold expansion. It seems to me 
that the threshold expansion dominated Chapter 19 and 20 to an 
extent which is completely unwarranted. 

Chapter 21 presents a number of general results in estimation 
theory applied to signal extraction. A brief and unsatisfactory 
chapter deals with information measures and reception and a final 
chapter mentioning some “Generalizations and Extensions’’ con- 
cludes the main body of the book—but that is not all! 

After the 23 chapters outlined above, Middleton has provided 
70 pages consisting of: 1) an appendix on “Special Functions and 
Integrals,” 2) an appendix on ‘Solutions of Selected Integral 
Equations,” 3) ‘Supplementary References and Bibliography,” 
4) an eight page “Glossary of Principle Symbols,’’ 5) ““A Name Index,” 
and 6) a remarkably complete and well-organized ‘“‘Subject Index.”’ 

Let us return now to the three aims of this book as outlined in 
the preface. A systematic approach to the design of optimal com- 
munication systems was outlined by Middleton and Van Meter in 
1955. The present book does provide a more unified treatment of 
much of Middleton’s early work. This is especially true of his 1946 
and 1950 papers on modulation theory and of his pioneering work 
in the applications of statistical decision theory. The more detailed 
treatment and the many examples in the book, however, can hardly 
be said to add to the original systematic approach. The value of 
this book as a text is, we feel, almost nil. The difficulties previously 
mentioned together with the added difficulty of learning funda- 
mental ideas from a book which needs an eight page ‘Glossary 


? 


IRE TRANSACTIONS ON INFORMATION THEORY 


October 


of Principle Symbols’’ do not recommend this book to the novice. 

The value of this book is in its scholarship and not in its pedagogy. 
The examples given, the problems provided, and the complete 
references at the end of each chapter make ‘An Introduction to 
Statistical Communication Theory” a necessity for the research 
worker in statistical communication theory. 


NorMAN ABRAMSON 
Electrical Engrg. Dept. 
Stanford University 
Stanford, Calif. 


Korrelationselectronik (Correlation Electronics)—F. H. Lange 
(VEB Verlag Technik, Berlin, Germany; 1959. 320 pages + 8 
index pages + 15 bibliography pages. Illus. 6 X 8 1/2. In German.) 


This book has two main parts, the first of which lays the founda- 
tions of “correlation analysis.’ The first two chapters summarize 
the mathematical definitions and principal methods involved in 
the analysis of correlation coefficients, correlation functions, ampli- 
tude spectra and power spectra. Although no new material seems 
to be presented, this reviewer enjoyed reading the 80-page summary 
for two reasons: first, the mathematics is written in terms familiar 
to the modern radio engineer. The author seems to sympathize with 
readers (like this reviewer), who prefer not to plow through page 
after page of mathematical formalism when results can be obtained 
in a straightforward manner, perhaps at the expense of a little less 
generality. Second, each mathematical result is accompanied by a 
clear description of its meaning and significance. Thus, the average 
engineer will understand the highlights of the first two chapters 
without difficulty. In this reviewer’s opinion, the author is less 
successful in his attempt to establish within the framework of 


general communication theory certain boundaries between ‘‘cor- — 


relation’? analysis, information theory, statistics, spectral theory, 
signal theory, etc. The third chapter is devoted to the problem of 
instrumenting correlators and devices for the measurement of cor- 
relation functions. Low-pass as well as band-pass devices are treated. 
Also included is an extensive discussion of the errors which occur in 
correlators as a consequence of imperfections in multiplier char- 
acteristics. 

The second part of the book deals with the application of the 
“correlation analysis’ in the field of communications and contains 
three chapters. The fourth chapter examines the extent to which 
the correlation function characterizes signals generated by informa- 
tion sources. The concept of the correlation interval is introduced, 
and is followed by a discussion of the autocorrelation function 
of speech sounds. Two-dimensional autocorrelation functions are 
introduced in connection with pictures and printed material. The 
main theme of the fourth chapter is that in many cases a direct 
determination of the spectrum is difficult; but the autocorrelation 
function might be found easily, in which case the Wiener theorem 
gives the spectrum. The author demonstrates this with the aid 
of examples in telegraphy, television, and thermal noise. A discussion 
of the cross-correlation function, correlation matrix, and the rules 
governing the addition of partially-correlated signals concludes the 
fourth chapter. The fifth chapter is concerned with the correlation 
analysis of linear and nonlinear transmission systems. The cross- 
correlation function between input and output of linear networks is 
derived. The mean-square error of linear servo systems is treated 
as an illustrative example. Diversity reception in the presence 
of fading with partially correlated paths is treated next. Two 
nonlinear problems have been selected by the author, noise through 
nonlinear four-terminal networks and various forms of modulators. 
The last chapter describes various types of correlation receivers. 
The theory of autocorrelation detectors, the problem of detecting 
periodic signals buried in noise, and an exposition of the matched 
filter concept open this chapter. Then the author uses the following 
examples to illustrate the application of cross-correlation detectors: 
the RAKE teletype system, the pattern recognition of decimal 
digits and finally the post-detection interferometer which has been 
used by radio astronomers. 

The list of references contains 335 entries covering the time 
period up to the middle of 1958. 

The author succeeds in his main purpose. This book is a well- 
written summary of the significance of the correlation function as 
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la powerful tool in the study of communication systems and as a 
basis for the design of various devices. The book brings together 
material which is widely scattered in the literature, and it does not 
require a high degree of mathematical sophistication on the part 
‘of the reader. The author did not suceed, however, in convincing 
this reviewer that “correlation electronics’ represents a separable 
entity within the framework of general communication theory. 


SIBGFRIED H. REIGER 
RAND Corp. 
Santa Monica, Calif. 


Sequential Decoding—J. M. Wozencraft and B. Reiffen (The 
Technology Press of the Massachusetts Institute of Technology 
and John Wiley and Sons, Inc., New York, N. Y.; 1961. v + 74 
pages. $3.75) 


This book reports on important research carried out at M.I.T. 
and Lincoln Lab. during the past four years in the area of error- 
correcting codes. It is based on the doctoral theses of the authors 
and on work by M. Horstein and R. G. Gallagher. It presents a 
new and possibly promising approach to the problem of error 
control that is basically different from the block code scheme 
which has dominated the intensive research on error-correcting 
codes to be found in the literature in recent years. We first list the 
contents of the book, then describe in more detail the results 
presented, make some passing comments, and finally return to a 
critique of the book. 

Sequential Decoding begins with a seven-page chapter discussing 
the general role of coding in communication as viewed by Informa- 
tion Theory. Chapter 2 presents an excellent short summary of 
certain known general results concerning the use of block codes on 
the binary symmetric channel. Bounds on the asymptotic rate of 
decrease of error probability are derived. The main theoretical 
results for the new error control scheme as applied to the binary 
symmetric channel are presented in Chapters 3 and 4. The former 
treats the reception procedure; the latter deals with the encoding 
method. Chapter 5 contains the interesting experimental results of 
the operation of a binary communication system using the sequential 
error control method as simulated on a large-scale computer. 
Chapter 6 presents most briefly some extensions of the scheme to 
more general channels. An appendix containing some bounds on 
the probability distribution for the sum of identically distributed 
independent random variables concludes the text. 

As little on sequential decoding has previously appeared in the 
readily available literature, a few technical words about the scheme 
are in order here. In the strict confines of a book review this descrip- 
tion must necessarily be incomplete. We give only the gist of the 
idea and point out how it differs from the block code approach. 

A message source produces a stream of symbols suitable for 
transmission over a discrete memoryless noisy channel. In the 
block code scheme first described by Shannon, this sequence of 
message symbols is segmented into disjoint adjacent blocks, each 
block being k symbols in length. A fixed encoding dictionary 
translates each such block into a sequence of n > k symbols which 
are then transmitted over the channel. The received stream of 
channel symbols is segmented into disjoint adjacent blocks of n 
symbols and a fixed decoding dictionary translates each received 
block into a block of k message symbols. Successive blocks of 
digits are encoded and decoded independently; the dictionaries 
remain fixed. Shannon’s famous coding theorem asserts that (under 
certain restrictions) the probability of error in the decoded symbols 
can be made as small as desired by making n and k large while 
maintaining R = k/n fixed and by choosing suitable dictionaries 
for encoding and decoding. 

In the scheme described by Wozencraft and Reiffen, encoding 
is accomplished by a “convolution code of constraint length n.” 
If the rate to be maintained is R = 1/no with m an integer, then n 
must be chosen as an integral multiple of no, say, n = kno. As each 
message symbol is presented to this convolutional encoder, it 
produces no symbols for transmission on the channel. The mo symbols 
chosen depend in a stationary way on the last k message digits 
presented to the encoder. The convolution encoder can be thought 
of as a window k symbols wide that is slid along the message symbol 


Book Reviews 


287 


stream. As the window is moved one symbol upstream, 7» channel 
symbols are produced. These channel symbols are a fixed function 
of the message symbols appearing in the window. 

The sequential decoder can also be thought of as a window— 
one that is slid along the stream of received symbols. This time the 
window is n = kno symbols wide; it is moved along the stream not 
symbol by symbol but in steps of mo symbols at a time. Att each step 
of the window, a decoded message symbol is produced by the 
decoder. This decoded message symbol is the result of a computa- 
tional procedure P performed on the n received symbols available 
in the window at the time. The procedure P itself does not change 
(in the simplest scheme described) as the decoder window is slid 
no symbols upstream. Input data to P, however, are the previous 
k-1 decoded message symbols as well as the » received symbols 
available in the window. It turns out then that the amount, JN, 
of computation (measured in an appropriate way) done under P 
for a fixed position of the decoding window is a random variable 
that depends upon the entire past history of noise symbols on the 
channel and on the message symbols previously presented for 
transmission. Details of P are too complicated to permit their 
description here. 

In Chapter 4, the authors show how to generate convolution 
codes for arbitrary no and k. The codes are easily instrumented. 
In Chapter 3, they present proofs for the following statements for 
the case of transmission over the binary symmetric channel: 


1) Under certain restrictions on the rate R, (R < half channel 
capacity is sufficient), there exist sequences of convolution codes 
of increasing constraint length n such that by using the sequential 
decoding procedure P the error probability of a decoded message 
symbol can be made vanishingly small given that the preceding 
k-1 decoded message symbols have been decoded correctly. Indeed, 
this error probability can be made to decrease exponentially 
with n. Note that the italicized condition makes this statement 
weaker than the corresponding result known for block codes. 2) 
The expected value of the number N of computations per message 
digit performed under P grows less rapidly than some small power 
of n. (Complete proof of this statement is not given in the book. 
Rather, proof is presented that a certain subset of the operations 
in P are so bounded, and the reader is referred to Reiffen’s thesis 
for proof that the remaining set is similarly bounded.) 


At first hand, statement 2) above seems in marked contrast to 
the situation found with block codes where dictionary size grows 
exponentially with n. But this raises a sort of philosophical point. 
In what sense is it fair to compare dictionary size with the number 
of computations made by the receiver per decoded digit? For parity- 
check block codes the actual computation done per block of n received 
symbol is: 1) the evaluation of n-k linear parity checks, 2) search 
in a dictionary of size 2”~* for a given word. If one regards the 
dictionary as permanently wired into the receiver by its maker, 
then 2) requires no computation by the receiver at all! Since k 
decoded message symbols are produced for each received block, 
the number of computations performed by the parity-check block- 
code receiver per message symbol is proportional to (n-k)/k = R-1, 
a quantity that does not depend on n. This analysis hedges on the 
fact that to wire in the dictionary may be an enormous job, in fact, 
with present knowledge of block codes, it becomes prohibitively 
costly for sufficiently large n, let us say n in the twenties with present 
computer technology. But then, if the receiver is to remain in con- 
stant use for years on end, very large effort on its initial construction 
may indeed be justified. The receiver may be very “‘complex,”’ but 
it does not do much “computing.”’ In one sense, much of the recent 
work on block parity-check codes can be viewed as an effort to 
effect the original organization of the decoding dictionary so that 
it can be wired into the receiver in a particularly simple, orderly way. 

Since N is a random variable in the sequential scheme, its variance 
as well as its mean is of considerable interest. Theoretical bounds 
on this variance do not seem to be obtainable, but experimental 
information on this quantity is given in Chapter 5 which reports 
on simulation studies of a sequential system. The variability of NV 
is not negligible, and the authors suggest modifications of their 
scheme to reduce this variability. To this reviewer, the results of 
the simulation were encouraging, but not conclusive. Much work 
remains to be done in this field, and it is much too early to make 
firm statements about the ultimate relative practical merits of 
block and sequential decoding methods. 

I found Sequential Decoding a very difficult book to read. A large 
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part of the difficulty is due to the intrinsic complexity and difficulty 
of the subject. (After the first reading performed for this review 
there are still many points of proof I do not understand.) Another 
part of this is no doubt due to intellectual shortcomings of my own. 
I feel, however, that a third and not entirely negligible part is due 
to the organization and exposition by the authors. Words are 
sometimes used carelessly and subtleties in the logical foundation 
of proofs are treated much too lightly. Chapter 4 should precede 
Chapter 3, and I strongly advise a reader embarking on this latter 
chapter to study Fig. 3.38 and the synopsis section on page 45 
before setting sail. The book could benefit by expansion in many 
sections. The point made in Section 4, page 66, is much too important 
to be relegated to the last page of the text and hidden among briefly 
treated extensions. One last critical remark. What purpose could 
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the publisher have in mind in omitting all punctuation in formulas 
in this and other recent books in the Technology Press Monograph 
Series? 

Despite its difficulty, Sequential Decoding is a must for any 
serious worker in the field of error control. The research reported is 
impressive and significant. For substantial depression of error 
probability (large n), the method proposed seems at present to be 
clearly competitive with if not superior to other methods known 
to me. What method will ultimately prevail is not clear. The field 
is still changing rapidly. 


D. SLEPIAN 
Bell Telephone Labs. 
Murray Hill, N. J. 
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